For the past few years, there has been a mantra that “every business is a software business.” But over the years, it has become increasingly outdated. The updated maxim is: “Every business is an AI business.”
Machine-learning and AI technology are very closely related to mapping, data collection, estimating delivery times and is becoming increasingly popular among global players. One such platform that has existed for over a year but not spoken about is Michelangelo, Uber’s machine learning platform, designed to manage data, deploy models, make and monitor predictions and train and evaluate the models built. The system also supports traditional machine-learning models, time series forecasting, and deep learning.
Michelangelo, an internal MLSaaS platform, democratises machine learning and makes scaling AI meet needs. Reportedly, it also enables internal teams to build, deploy, and operate machine learning solutions.
Dozens of teams within the company have been building and deploying AI models through the platform. It runs across several Uber data centres, leverages specialised hardware, and serves predictions for the highest-loaded online services, according to a post co-written by Jeremy Hermann, head of the machine-learning platform at Uber, and Mike Del Balso, product manager of machine learning at Uber.
How has Uber been using AI and ML?
Uber has a core team providing pre-packaged machine learning algorithms ‘as-a-service’ to its team of mobile app developers, map experts and autonomous driving teams. Moreover, the company has claimed that machine learning is a part of its DNA. Let us see how Uber has been using AI and ML to bring more accuracy to its predictions.
AI is not new at Uber. Uber has admitted using artificial intelligence to charge customers based on what they are likely to be willing to pay. The ride-hailing service has said that the system is based on AI and algorithms, which estimate fare rates that groups of customers will be willing to pay depending on destination, time of day, and location.
Uber used machine learning techniques to bring greater personalisation into its core rider app late last year. The upgraded app starts by asking for your destination, including a number of predictions based on your habits and your current location. For example, if you are at the office it will assume you want to go home, or to the gym, or the pub.
Uber also uses machine learning algorithms layered on top of their historic trip data to make more accurate estimated time of arrival (ETA) information, taking into account traffic patterns. The company has been using data from the two billion logged trips it has to ‘learn’ where good pickup spots are.
How Michelangelo adds up?
Michelangelo has been serving production use cases at Uber for about a year and has become the de-facto system for machine learning for engineers and data scientists, with dozens of teams building and deploying models within the company.
The best use case for understanding what Michelangelo does would be UberEATS, the food ordering service launched by Uber in 2014. The MLSaaS is trained and designed to predict meal delivery time, search rankings, search autocomplete as well as restaurant rankings.
Its delivery time models predict the time taken for a meal to be prepared and delivered before the order is issued and then again at each stage of the delivery process.
At the core of the user experience in a meal service is the time to delivery. Initially, that was basically thought about as a classical computation. The distance between the user and the restaurant, and the average speed in the town, and then some average time to prepare the meal.
In reality, Michelangelo now uses data to predict how long it takes to make noodles, how long it takes to make a hamburger, and how long it takes to deliver it in different parts of town at different times of the day.
The Michelangelo platform provides the UberEATS data scientists with gradient boosted decision tree regression models to predict the end-to-end delivery time. Features for the model include information from the request (e.g., time of day, delivery location), historical features (e.g. average meal prep time for the last seven days), and near-real-time calculated features (e.g., average meal prep time for the last one hour).
Models are deployed across Uber’s data centres to Michelangelo model serving containers and are invoked via network requests by the UberEATS micro services. These predictions are displayed to UberEATS customers prior to ordering from a restaurant and as their meal is being prepared and delivered
The delivery model, based on machine learning, can predict how much time a meal will take to prepare and deliver before the order is issued and then again at each stage of the delivery process. The flowchart shows the predictive analysis that Michelangelo does to provide a better delivery experience.
What are Michelangelo’s components?
Michelangelo is built with a mix of open source systems and components made in-house. The primary open sourced components used are HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.
The way forward for Michelangelo
Uber will undoubtedly continue to scale and harden the existing system. It will go further to work upon a higher level of tools and services to drive democratisation of machine learning and better support the needs. Going forward, these are the developments that Uber might be working upon to make Michelangelo more efficient.
Auto Machine Learning: This will increase the productivity of data scientists by allowing them to specify set of labels and objective function. The system could also then make the most of the privacy-and security-aware use of Uber’s data to find the best model for the problem.
Model visualisation: Uber has already made some initial steps with visualisation tools for tree-based models, but much more needs to be done to enable data scientists to understand, debug, and tune models and for better end results for users.
Online learning: Michelangelo is already being trained to develop a full platform solution that is easily updateable, trains faster and sees evaluation architecture and pipelines. Scientists at the company are also working on creating a system that automates model validation and deployment, along with developing sophisticated monitoring and alerting systems.
Distributed deep learning: For high-level of machine learning, it is mandatory to implement higher levels of deep learning technologies. It will not just handle larger data but also motivate distributed learning.