MITB Banner

Qubole – Next Generation Cloud Data Platform

Share

With the proliferation of applications and end devices – web, mobile, sensors etc. – the last few years and the foreseeable future promise the continuation of the trend of explosive growth in the volume of data being collected by various organizations. At the same time there is an ever growing variety in the types of data – ranging from structured data originating in application databases, to more semi-structured content originating in social media, web properties such as wikipedia as well as internal applications such as email systems, company support boards etc. Additionally systems and software stacks (most notably Apache Hadoop) that are able to keep up with this growth – both in variety as well as in volume – are still complex to operate and far from perfect. They are still in the early stages of development and market adoption. As a result it comes as no surprise that many organizations struggle to keep up with operating, optimizing and making their data infrastructure work to serve their data processing needs.

Qubole Vision

The complete data infrastructure solution has many components. The main ones are as follows:

  • Data Collection Service for both real time and bulk upload of data from different data sources such as applications, databases, web crawls etc.
  • Batch Computation Service such as Hadoop/Hive to process this data and transform it from data to information.
  • Real Time Computation Service to generate real time results on data streams and data captures for time sensitive and actionable reporting and monitoring.
  • AdHoc Query Service to answer one of queries sometimes exactly and other times approximately in a short amount of time.
  • Tools and Frameworks for job dependencies, data and query discovery, SLA and monitoring etc.

Qubole (www.qubole.com) aims to provide all of the above components (and some more) in the cloud. We want to provide a fast, easy and reliable access to all the services mentioned above so that our clients can focus more on their data and their algorithms while we take care of optimizing, operating and evolving the data infrastructure for them. We want to enable the data engineers, data scientists and data analysts to work with their data and generate data driven applications whether these applications are simple reporting applications or more complex targeting or recommendations applications.

In the pursuit of this vision our first offering is an Adhoc Query and Batch Computation Service in the Cloud. This service provides Apache Hive and Apache Hadoop as a service with close integration with Apache Oozie. It is ideal for data stored in S3 that you want to do adhoc analysis on and on which you want to create data pipelines. This service is currently available as part of an early access program. We are working with a select set of companies in this program and we will be making this service available to everyone by Q4 2012. The details of this program and the service are in the subsequent sections of this white paper.

Qubole Team Background

Qubole was started by data infrastructure veterans from Facebook who conceived, built, managed and operated the infrastructure on which almost all of Facebook backend data processing works. The co-founders of the company (Ashish and Joydeep) are the co-creators of the Apache Hive project – a very prominent platform built on top of Apache Hadoop. The Hadoop and Hive clusters at Facebook grew under their guidance from managing 80TB of data to 20PB of compressed data from late 2007 to late 2011. The Qubole team comprises of talented engineers who have worked and delivered strong products in companies like Oracle, NetApp and Yahoo. Qubole raised money from Lightspeed Ventures and Charles River Ventures – two well known VC firms in the valley. We are seeking out organization to try out an early beta version of our service.

Share
Picture of Bhasker Gupta

Bhasker Gupta

Bhasker is a techie turned media entrepreneur. Bhasker started AIM in 2012, out of a desire to speak about emerging technologies and their commercial, social and cultural impact. Earlier, Bhasker worked as Vice President at Goldman Sachs. He is a B.Tech from the Indian Institute of Technology, Varanasi and an MBA from the Indian Institute of Management, Lucknow.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.