YouTube remains one of the most watched websites for video content. Started in 2005, it offers videos which cover an array of topics right from education to famous music videos by artists and celebrities and has even spread out to gaming content (later in 2006, it was acquired by tech giant Google). Even though most videos are available for free, it also has paid content. The popularity has risen so much that YouTube and videos are almost synonymous terms. In 2017, it was estimated that one billion hours of video content was watched online, everyday on YouTube.
How does the algorithm work?
The recommendation algorithm of YouTube, that plays an important part in feeding relevant content to its users was the brainchild project of The Google Brain team, which is currently driven by their deep learning systems. Basically, the algorithm consists of two neural networks. The first neural network works on candidate generation which means the network utilises the users’ watch history and applies the concept of collaborative filtering technique to suggest similar videos based on watch history. The best algorithms usually are obtained by A/B testing, which uses two variables– for example, two versions of the same web-page, to determine which one fetches more views. This way, the user experience is improved.
The second neural network uses logistic regression to improvise and prioritise those similar videos that fall in the users’ watch history pattern. Improvements are further done by using A/B Testing. But, in this case, the duration of a user watching video(s) and the response (like, dislike and comment) are considered. Positive responses or likes by the user fare a small amount to that of the total responses including dislikes and comments. Therefore, weights are assigned to these responses to help assess videos using logistic regression, and predicting what the user wants. This method is called Ranking. The ones which align to the regression analysis become the ‘viral videos’ or ‘trending videos’.
Pitfalls and Improvements
One performance improvement study by the Google research team states that “YouTube represents one of the largest scale and most sophisticated industrial recommendation systems in existence”. Although the algorithms are powerful and match relevant content according to the users’ interests, sometimes they fail to distinguish between the good and the bad. For example, YouTube suggests controversial content regardless of any thought.
For example, recently it recommended the infamous vlogger Logan Paul’s return on YouTube. To make things worse, it even notified the users. This leads to a chain reaction with other users who don’t know him, often viewing the content without their knowledge. Similarly, another incident occurred with YouTube suggesting conspiracy videos of Hillary Clinton during the US Presidential campaign in 2016. These incidents are just on the surface level. Some videos on the site even consolidate lame content such as the Earth being flat or vaccines lead to autism. On the other hand, YouTube does remove inappropriate content only after it has been reported in large numbers. This will only be a temporary solution.
The fact of the matter is, the data surrounding these algorithms comes from the vast user base YouTube has garnered. Not all users share the same opinions for videos of similar category. This criticality should be addressed by the recommendation system used in YouTube. In recent developments, it is looking to sort its ‘trending videos’ over the massive content catered to its viewers by restructuring the recommendation algorithm.
Recommendation systems best work with past user data such as watch history. When it comes to its implementation in real-time, deep neural networks for these systems require continuous and regulated data. To achieve this, YouTube has hired people to manually review content which are malign and inappropriate, with the help of machine learning. The manual video moderation campaign by YouTube has been very aggressive, with the team nearly stripping any offensive content it encounters on the website.
YouTube has daunting challenges to resolve, such as massive video content uploaded on their website coupled with the influx of duplicated and mis-titled videos. Along with this, the metrics for the recommendation algorithm might seem far-fetched to serve user interests’ on broader terms.
YouTube shares the biggest chunk in the video content platform category when it comes to viewership. It needs to pick up pace to get acquainted with user patterns for machine learning, since it is ever growing. Ultimately, it is not just addressing the content, but how intelligent and smart the algorithms are to take on dire consequences quickly.
Try deep learning using MATLAB