In case you were wondering, here’s another sign of the Google Cloud Vs Amazon Web Services war heating up. Google has now brought in the big guns in the analytical data warehousing space with by embedding machine learning capabilities into Google BigQuery. Google BigQuery is an analytics service, low-cost enterprise data warehouse which has now been rebranded as BigQuery ML.
One of the key features of BigQuery is that it transforms SQL queries into complex execution plans, dispatching them onto execution nodes to promptly provide insights into the data. BigQuery enables developers to execute SQL as a massively parallel processing query with hundreds of CPU cores and ample disk storage, scanning and aggregating terabytes of data in seconds. BigQuery ML, a capability inside BigQuery enables analysts and data scientists to build and deploy ML models on massive structured or semi-structured datasets.
Dremel Technology Is The Key
At a time when Hadoop was facing intense competition, Google released beta access to BigQuery, a new SQL processing system based on Dremel technology, a distributed query engine in 2011. And BigQuery provides the core set of features available in Dremel to third-party developers. A key point of Dremel technology was its cost-to-value ratio. It can scan 35 billion rows without an index in tens of seconds and there is no capital expenditure required on the user’s part for the supporting infrastructure, a technical paper reveals.
Dremel, the cloud-powered massively parallel query service, is immensely popular among developers, Amazon Redshift remains the leader in data warehouse space in terms of performance, cost and usability. Even though they are both columnar data warehouses, BigQuery scores in its Tree Architecture of Dremel which is used for dispatching queries and aggregating results across thousands of machines in a few seconds. Both Amazon Redshift and BigQuery are based on columnar storage, which makes them best for analytics workload, as opposed to relational databases like Postgres and MySQL.
Are The ML Capabilities Giving An Advantage Over Traditional Data Warehouses?
Since ML requires programming and knowledge of ML frameworks, it keeps data analysts out and restricts the use of ML to a small set of users, mainly data scientists. Now, BigQuery ML enables data analysts to leverage ML through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Since queries can be done directly against the BigQuery database, no additional extract, transform, and load (ETL) tools are required, Rajan Sheth, senior director of Product Management at Google said during the Google Next 2018 conference.
Cleaning And Preprocessing Data In SQL: Users can create ML models in BigQuery with SQL queries. For example, if analysts want to train a logistic regression model, they can do directly in BigQuery ML, which means you can slice your data and also explore different processing options. One user on a forum pointed out that in most cases, developers don’t require to train neural networks, especially for structured data. In this case, cleaning and preprocessing data is relatively smooth in SQL.
More Power To Data Analysts: It gives more power to data analysts who know SQL but don’t have much knowledge of ML frameworks to develop models without any programming knowledge or leveraging additional tools.
Democratises ML: BigQuery ML democratises ML by allowing developers to build models using their existing tools and to increase development speed by eliminating the need for data movement.
Reduced Waiting Time: BigQuery ML significantly increases the speed of model development by eliminating the function of exporting data from the data warehouse. Instead, BigQuery ML brings ML to the data. Analysts no longer need to export small amounts of data to spreadsheets or other applications. Also, the documentation emphasises there is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language data analysts know. Since BigQuery is designed to run queries on Big Data in as little as a few seconds, it is best suited for querying for large datasets. However, currently, BigQuery ML only supports two types of models — linear regression for forecasting and logistic regression used for classification purpose.
Of late, Google BigQuery has emerged as the next viable option after Amazon RedShift, and it is suitable for both OLAP and BI use cases. In fact, 20th Century Fox tested the beta to understand its movie marketing data by running a SQL query for audience analysis, that was appended with a “create model” statement. Google BigQuery ML returned a linear regression model against the query, thereby effectively predicting who would want to see a soon to be released movie. This data was used to reformulate the media planning for the movie.