Joydeep Sen Sarma will be speaking at The Fifth Elephant on – How do you build a big data service in the Cloud? How can we make queries against relatively slow Cloud Storage Systems fast? How can we take real advantage of the elasticity available in the Cloud? How do you make the Cloud dead easy to use for big data processing?
In an interview with Analytics India Magazine, Joydeep talks more about the ‘The Elephant in the Cloud’.
[dropcap style=”1″ size=”2″]AIM[/dropcap]Analytics India Magazine: Could you tell us more about the topic you are speaking on at The Fifth Elephant?
[dropcap style=”1″ size=”2″]JSS[/dropcap]Joydeep Sen Sarma: One of my talks is about what we have learnt building analytics infrastructure on the cloud and the key problems we have solved (so far) in this space.
AIM: How was Qubole incepted, how has it evolved over years and what is the next step?
JSS: We are a very young company – having been started only in October, 2011. We are still taking our first baby steps in this world! Our initial thesis behind founding this company can be listed roughly as follows:
- We believe in the service model. Good quality free software is abundant in today’s world. But assembling and operating software is hard and expensive. By building services – we want to reduce this cost for businesses. As engineers – we have found that we can innovate faster and provide higher quality software behind a service boundary. It’s a win-win for everyone.
- More and more data originates in the Cloud – as applications are deployed there. We want to make it very easy to analyze and build data driven applications in such environments. We are very optimistic on the future of the Cloud and see it proliferating inside enterprise data centers as well – and want to see ourselves as one of the leading analytics infrastructure in these environments. Our belief is that the Cloud Architecture will be an important milestone in computing similar to Mainframes and PC Architecture.
AIM: Qubole works at the inception of cloud computing and analytics, two biggest growth area in technology. Could you tell us about how the future looks like for you?
JSS: Our short term focus is providing analytics services in AWS. There are many different directions to grow from here. One obviously is to integrate with other cloud environments – like Azure, OpenStack etc. While we are a standalone service at this point – our future is clearly in integrating with other (web) services and software products – and this is another important dimension for us to grow in. Finally, we need partners to help customers use our platform and this is another longer term direction that we need to make progress on to make our business successful.
AIM: Why is cloud not easy to use for big data processing?
JSS: Engineers building big-data stacks in the Cloud have to understand many new concepts. Most of the software available does not take into account the fundamental capabilities of the Cloud (like Elasticity and cheap but slow blob storage like S3). As a result – running a good analytics stack on the Cloud requires expert engineers (who are hard to hire). Even when a solution is put together – it’s mostly built by engineers, for engineers. Business analysts frequently don’t have access to big data backends. Some of these points are common to both cloud and in-house data center environments – but it’s easier to solve some of these problems in a managed-service environment.
AIM: What do existing users of Hadoop and Hive get out of Qubole?
JSS: If you running in Hadoop/Hive in AWS – you would find Qubole dramatically easier to use and operate.
- We can consolidate all your workloads into a single virtual cluster, automatically scale it up and down and automatically cache S3 data and (in many cases) speed up your jobs manifold. We can save your analysts time and your organization money.
- We can democratize data access in your organization. Analysts in your organization can access detailed-data directly and build reporting pipelines themselves – instead of having to depend on ETL engineers for small changes. Our browser based application can be used
to launch and monitor queries, clusters etc. and author and run sophisticated periodic jobs to build data driven applications.
Joydeep is a co-founder at Qubole and heads their India development team. Prior to starting Qubole – Joydeep worked at Facebook where he boot-strapped the data processing ecosystem based on Hadoop, started the Apache Hive project and led the Data Infrastructure team. Joydeep was a key contributor on the Facebook Messages architecture team that brought Apache HBase to Facebook and to the transactional and reporting backends for Facebook Credits. He has been a driver for other important sub-projects in the Hadoop ecosystem – like the FairScheduler and RCFile.
Joydeep studied Computer Science at IIT-Delhi and University of Pittsburgh and started his career working on Oracle’s database kernel and building highly available and scalable file systems at Netapp. In between – he has played founding roles in storage and advertising startups. He cut his teeth building data driven applications as the lead engineer on Yahoo’s in-house Recommendation Platform. Joydeep holds numerous patents, has many published papers and has been both speaker and panelist at Hadoop summits and at other Silicon Valley conferences.[/spoiler]
Try deep learning using MATLAB