MITB Banner

Interview – Joydeep Sen Sarma, Co-Founder and Head at Qubole India

Share

Joydeep Sen Sarma will be speaking at The Fifth Elephant on – How do you build a big data service in the Cloud? How can we make queries against relatively slow Cloud Storage Systems fast? How can we take real advantage of the elasticity available in the Cloud? How do you make the Cloud dead easy to use for big data processing?

In an interview with Analytics India Magazine, Joydeep talks more about the ‘The Elephant in the Cloud’.

Analytics India Magazine: Could you tell us more about the topic you are speaking on at The Fifth Elephant?

Joydeep Sen Sarma: One of my talks is about what we have learned building analytics infrastructure on the cloud and the key problems we have solved (so far) in this space.

AIM: How was Qubole incepted, how has it evolved over the years and what is the next step?

JSS: We are a very young company – having been started only in October 2011. We are still taking our first baby steps in this world! Our initial thesis behind founding this company can be listed roughly as follows:

  • We believe in the service model. Good quality free software is abundant in today’s world. But assembling and operating software is hard and expensive. By building services – we want to reduce this cost for businesses. As engineers – we have found that we can innovate faster and provide higher quality software behind a service boundary. It’s a win-win for everyone.
  • More and more data originates in the Cloud – as applications are deployed there. We want to make it very easy to analyze and build data-driven applications in such environments. We are very optimistic about the future of the Cloud and see it proliferating inside enterprise data centers as well – and want to see ourselves as one of the leading analytics infrastructures in these environments. Our belief is that the Cloud Architecture will be an important milestone in computing similar to Mainframes and PC Architecture.

AIM: Qubole works at the inception of cloud computing and analytics, the two biggest growth areas in technology. Could you tell us about how the future looks like for you?

JSS: Our short term focus is providing analytics services in AWS. There are many different directions to grow from here. One obviously is to integrate with other cloud environments – like Azure, OpenStack etc. While we are a standalone service at this point – our future is clearly in integrating with other (web) services and software products – and this is another important dimension for us to grow in. Finally, we need partners to help customers use our platform and this is another longer-term direction that we need to make progress on to make our business successful.

AIM: Why is cloud not easy to use for big data processing?

JSS: Engineers building big-data stacks in the Cloud have to understand many new concepts. Most of the software available does not take into account the fundamental capabilities of the Cloud (like Elasticity and cheap but slow blob storage like S3). As a result – running a good analytics stack on the Cloud requires expert engineers (who are hard to hire). Even when a solution is put together – it’s mostly built by engineers, for engineers. Business analysts frequently don’t have access to big data backends. Some of these points are common to both cloud and in-house data center environments – but it’s easier to solve some of these problems in a managed-service environment.

AIM: What do existing users of Hadoop and Hive get out of Qubole?

JSS: If you running in Hadoop/Hive in AWS – you would find Qubole dramatically easier to use and operate.

  • We can consolidate all your workloads into a single virtual cluster, automatically scale it up and down and automatically cache S3 data and (in many cases) speed up your jobs manifold. We can save your analysts time and your organization money.
  • We can democratize data access in your organization. Analysts in your organization can access detailed-data directly and build reporting pipelines themselves – instead of having to depend on ETL engineers for small changes. Our browser-based application can be used
    to launch and monitor queries, clusters etc. and author and run sophisticated periodic jobs to build data-driven applications.

Joydeep is a co-founder at Qubole and heads their India development team. Prior to starting Qubole – Joydeep worked at Facebook where he boot-strapped the data processing ecosystem based on Hadoop, started the Apache Hive project and led the Data Infrastructure team. Joydeep was a key contributor on the Facebook Messages architecture team that brought Apache HBase to Facebook and to the transactional and reporting backends for Facebook Credits. He has been a driver for other important sub-projects in the Hadoop ecosystem – like the FairScheduler and RCFile.

Joydeep studied Computer Science at IIT-Delhi and University of Pittsburgh and started his career working on Oracle’s database kernel and building highly available and scalable file systems at Netapp. In between – he has played founding roles in storage and advertising startups. He cut his teeth building data driven applications as the lead engineer on Yahoo’s in-house Recommendation Platform. Joydeep holds numerous patents, has many published papers and has been both speaker and panelist at Hadoop summits and at other Silicon Valley conferences.

Share
Picture of Bhasker Gupta

Bhasker Gupta

Bhasker is a techie turned media entrepreneur. Bhasker started AIM in 2012, out of a desire to speak about emerging technologies and their commercial, social and cultural impact. Earlier, Bhasker worked as Vice President at Goldman Sachs. He is a B.Tech from the Indian Institute of Technology, Varanasi and an MBA from the Indian Institute of Management, Lucknow.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.