Productivity and social networking are two terms which don’t always go hand-in-hand, at least for the millennials. But the emergence of LinkedIn has changed that.
Most grad students and working professionals know the importance of knowing someone in the field that they aspire to be in. Breaking all the barrier for a more free-flowing conversation, LinkedIn provided the common person with possibilities unbeknownst to the professional sector.
With more than 200 million users logging in every month, the platform has a hectic task of collecting, handling and serving the data accurately without any latency. Though there are many conventional on-demand services and state-of-the-art machine learning models, the engineers at LinkedIn were flexible enough to combine these conventional strategies with their built in-house tools to drive better results.
Building ML Models To Scale
The blueprint of a machine learning model would more or less consist of the same procedures; data collection, processing, training and testing the models and so on.
Firms seeking ML transition can get an idea of how to scale their productivity by taking a look at LinkedIn’s model:
- The major part of data or to be more precise, the most crucial data with respect to LinkedIn is based on the kind of jobs liked, jobs saved and connections made. So, recommending jobs to an individual and calculating the probability of a job posting being checked are one of the few important features of the dataset.
- At LinkedIn, the ML team proceeds by building a domain-specific language (DSL) and then a Jupyter notebook to integrate the select features and for parameter tuning.
- Most of the model training occurs offline where the ML teams train and retrain the models every few hours. For this, they avail the services of Hadoop. LinkedIn’s own Pro-ML training service is updated with newer model types for hyperparameter tuning. This training service leverages Azkaban and Spark to ensure that there is no missing input data.
The term ML model refers to the model artefact that is created by the training process. The training data must contain the correct answer, which is known as a target or target attribute.
The learning algorithm finds patterns in the training data that map the input data attributes to the target (the answer to be predicted), and it outputs an ML model that captures these patterns.
A model can have many dependencies and to store all the components to make sure all features available both offline and online for deployment, all the information is stored in a central repository.
“The deployment service provides orchestration, monitoring, and notification to ensure that the desired code and data artefacts are in sync. The deployment also ties with the experimentation platform to make sure that all active experiments have the required artefacts in the right targets in the overall system,” says the ML developers team at LinkedIn.
In addition to the aforementioned training services, custom built execution engine, Quasar is built to run the domain specific language(DSL) and also a Java API for composing online workflows and running recommendation engines.
To handle large chunks of data, Frame is deployed. A Frame is a system which contains metadata about the features in the centralised repository, making the job of searching, easy for the engineers.
Structure Of Pro-ML
LinkedIn went the unconventional way in organising its team for maximising the ML productivity. Here AI teams are closely connected to the product team. This bridges the gap for researchers to collaborate and share their findings with fellow experts who might be working on similar problems. Hence reducing the redundancies and increasing output.
Key Ideas
- Leverage and improve best-of-breed components from our existing code base to the maximum extent feasible.
- Use an agile-inspired strategy; making one product line better at a time.
- To enable services hosting the models to be independently upgraded without breaking their downstream or upstream services.
- Enable new technologies to be A/B testable in production.
The goal of Pro-ML is to double the effectiveness of machine learning engineers while simultaneously opening the tools for AI and modelling to engineers from across the LinkedIn stack.
LinkedIn’s homegrown technologies have paved ways to a faster and better ML approach. Currently, enterprises are struggling to deploy machine learning models at full scale. Common problems include- talent searching, team building, data collection and model selection to say few. To tap the most out of AI, it is necessary to build service-specific tools and frameworks in addition to the existing models and the success of LinkedIn verifies the same.