Open sourcing machine learning tools are the norm in the tech world. Salesforce is the latest in the line of tech firms to open source its machine learning software TransmogrifAI which helps build machine learning systems at an enterprise scale. Shubha Nabar, senior director, Data Science at Salesforce Einstein, revealed in a post that the diversity of data and use cases at enterprise companies makes machine learning for enterprise products a big challenge.
In other words, every use case necessitates the need to build customer-specific ML models. It isn’t, however, possible to build and deploy thousands of personalised ML models trained on each individual customer’s data for every single use case. This is the same library built on Scala and SparkML that is used to power Einstein AI platform.
At a time when IT giants are rushing to reshape Enterprise ML with homegrown autoML libraries, how will Salesforce’s TransmogrifAI change the machine learning landscape? Now, automated ML solutions usually automate a few or all of the steps of the ML process. Some of these steps are data preprocessing or cleansing, feature engineering, feature extraction, feature selection and hyperparameter optimisation or algorithm selection.
Salesforce is not the first tech giant to release an AutoML tool. While Google has the first mover advantage by delivering on its promise with Google Cloud AutoML, another tech giant that has provided automated ML tools like auto classifier is IBM’s SPSS, one of the most widely-used analytics tool in the market. Other AutoML tools include Auto WEKA for automatic model selection and hyperparameter optimisation and OptiML for automatic model optimisation. But Salesforce takes the lead in the end-to-end automation of the ML process.
Let Us Encapsulate The High Points Of This Tool:
- Firstly, TransmogrifAI is an AutoML library for building modular ML workflows on Spark that require minimal hand tuning.
- TransmogrifAI, written in Scala runs on top of Apache Spark is an automated ML library that simplifies the selection and model training for structured data. As Nabar puts it, most AutoML solutions today are focused on narrow tasks, or are built for unstructured data such as voice, image and text.
- TransmogrifAI builds models at scale for structured heterogeneous data and is billed to perform the key components of ML process — data cleansing, feature selection and model training in three lines of code.
- In a few lines of code, a data scientist can automate key tasks like data cleansing, feature engineering, and model selection to arrive at the right model which can be iterated further.
- According to Mayukh Bhaowal from Salesforce, since 90 percent of the time is spent building models goes into creating the perfect numeric matrix of features to feed into the chosen algorithm, data scientists are required to reinvent the wheel every time. With this tool, data scientists can automatically engineer features based on the type of feature, data distribution and association with the response variable.
- Another key feature of the tool is the model explainability which takes away the black box issue associated with ML. Nabar emphasises that from a trust and data point of view, this model isn’t a black box.
- Salesforce has pitched this as a collaborative effort in building large-scale customer-specific ML models. The launch of the Spark-based ML framework came a day after Oracle open sourced its tool GraphPipe, a tool for deploying ML models on frameworks like Google’s TensorFlow and Facebook’s Caffe2.
AutoML, The Next Step In Democratising ML
Of late, the success of deep learning in automating tasks like image recognition and speech recognition has been achieved largely due to the automation of feature engineering process, where hierarchical feature extractors are learned from data, rather than being manually designed. Researchers from Bosch Centre for AI point out in their paper that the process of automating architecture engineering is a logical next step in automating ML.
This is why we see top companies like Google, Amazon, Microsoft and Salesforce open sourcing tools to enable data scientists to deploy models with minimum of hand-tuning and reducing the turnaround time. While Google has a lead in democratising AI with tools to enable developers to build AI at scale, Salesforce tackles the structured data challenge wherein there is a range of use cases where organisations require a vast amount of data to predict sales forecasts, conversions and customer churn.
Advantages Of Automated ML Tools
While AutoML solutions improve business outcomes significantly, reduce the turnaround time exponentially and also improve accuracy, there are certain disadvantages as well. The market for automated ML tools is increasing and will grow stronger but on Kaggle, humans still oust results generated by AutoML tools. Increasingly, a lot of data scientists are also relying on AutoML tools to optimise model performances.