Reinforcement Learning is a reward-based learning algorithm in which an agent or a system interacts with its environment, looks for a reward and adapts to the change in state of the environment. The easiest and most common example to relate Reinforcement Learning to the real world is by understanding the learning process of a baby.
Into The Horizon
Convolutional Neural Networks and Recurrent Neural Networks are dominating the businesses in Computer Vision and Natural Language Processing. But Reinforcement Learning has often been underestimated regardless of its importance in decision-making problems.
In recent years Reinforcement Learning has proved to achieve state-of-the-art performance in handling complicated tasks.
Now, Horizon, a platform by Facebook, is addressing and overcoming the challenges in reinforcement learning problems.
Horizon is an open source, end-to-end platform that facilitates Reinforcement Learning. It is built in Python and uses Python libraries like PyTorch for modelling and training and Caffe2 for model serving.
Horizon was designed considering the following principles:
- Ability to handle large datasets efficiently
- Ability to preprocess data automatically and efficiently
- Competitive algorithmic performance
- Algorithm performance estimates before launch
- Flexible model serving in production
- Platform reliability
How Does Horizon Help Facebook?
Push Notification of Facebook: Push notifications are notifications sent to mobile devices, and a broader set of notifications are accessible from within the application or website.
As the hottest social media platform that exists today, Facebook constantly updates its technologies to improve user interaction. Push notifications have been serving Facebook in sending personalized and time-sensitive updates to its users for a very long time. The push notification in Facebook allows its users to connect to the most important updates they have been waiting for or wanting to get, updates that may include interesting events, friend’s activities or posts, comments or likes on posts.
Facebook’s traditional means of interacting with its users via notifications depended on Supervised learning models that calculated the click-through rates (CTR) of each user and were based on the users’ previous likes that might probably lead to a meaningful interaction, meaning that the user got what he was expecting.
However, the traditional means were incapable of preserving the long-term or incremental value of sending notifications.Also due to the fact that each user has his/her own preferences, filtering based on static threshold misses out on the improved experience of notifications especially for those with different sensitivities to being notified.
Horizon’s Take On The Problem
Horizon trains a Discrete Action Deep Q-Networks (DQN) to learn a policy to determine whether to send a notification or not based on the state represented by many features. The Markov Decision Process (MDP) here is based on a sequence of notification candidates for a particular person. The actions here refers to sending and dropping of notifications and the state describes the set of features about the person and the notification candidate and rewards are given for each interaction and activity on Facebook with a penalty to control the volume of notifications sent.This compared to the Supervised model optimizes for the long-term value and is capable of capturing incremental effects on sending the notification by comparing the Q-values of the send and don’t send action.
Horizon was able to achieve a significant improvement in activity and meaningful interactions by deploying an RL based policy for certain types of notifications, replacing the previous system based on supervised learning.
Other Applications In Facebook That Use Horizon
Page Administrator notifications: Besides push notification for users Horizon is also employed to provide meaningful interactions to page Administrators.The RL model trains on huge datasets captured from live responses and interactions of admins. The rewards computed in the training allow the model to identify the page admins may like to stay active with the help of notification.
360-degree video: The 360-degree video team at Facebook has applied Horizon in the adaptive bitrate (ABR) domain to reduce bitrate consumption without harming people’s watching experience.
Future of Horizon
Facebook has no plan to stop it with Horizon and in near future Horizon will be more equipped to handle RL problems than it is now.
Horizon will be improved shortly with updates in two major categories :
- New Models and Model Improvements
- CPE integrated with real metrics
The Horizon team plans to bring in new and best performing models from the research community and at the same time improve the existing ones. Additionally, Horizon will allow developers to input a set of metrics that they are interested in tracking and CPE will be used to estimate the change to these metrics, independent of the reward CPE.
Facebook the ever dominating social network platform has yet again proved that it can contribute to the enhancements in Machine Learning algorithms. Facebook’s Horizon approach regarding decision making has proved to be efficient and it has plans to bring in more improvements and add more to its ever-growing community.