How YouTube’s ML Algorithm Earned Billions For Music Producers By Building Fingerprints For Songs

Published on March 26, 2019
by Anirudh VK

Before the rise of audio streaming services such as Gaana, Saavn and JioMusic, the premier destination for music was YouTube. A large number song from Bollywood record labels were present on the platform for users’ listening pleasure. It was a model that worked for everyone, with the user not being charged for listening to the music and the copyright holders benefitting.

Companies such as T-Series rode this trend to its peak and is now going after the crown of YouTube’s most subscribed channel. Taking a step back and looking at the bigger picture, the question arises as to how one of the most commercialised and copyright-heavy channels is so accessible to the masses.

In a world of rampant piracy and copyright infringement, YouTube exists in the intersection of paywalling and facilitating theft. As with every new frontier, the appointed watchdog and guardian of YouTube’s copyright-friendly atmosphere emerged. This is the system we know today as Content ID.

Why Copyright Was The First Frontier

The Internet was a free space in its infancy, and could very well be compared to the Wild West when it first emerged. As the net was opened to the masses, software pirates rushed on the scene and created an impression on record labels and producers. A message that the Internet was not a place that copyright could not be enforced, at least not in any sense of power.

Websites were protected against legal action by the safe harbour provisions of the Digital Millennium Copyright Act. This specifies that copyrighted content could exist on websites as long as there was a way to remove it.

Record labels and music producers kept away from putting music online for this very reason. Copyright infringement was the bane of their existence, sucking away at what little profits they were making as the Internet took over the world. Due to the free nature of the DMCA guidelines, a pirate could very easily make duplicates of a song posted on a platform by a copyright holder, leading to infringement and theft.

When Google acquired YouTube, this is the precise issue they wished to avoid. Examples such as Grooveshark and Napster stuck to the top honchos at Google, with a solution required quickly to avert the possibility of being sued by record labels. This prompted them to create a system that would automatically flag copyrighted content as early as 2007. They looked to make YouTube into a platform where companies could upload copyrighted content and get compensated for it without unauthorised pirates trying to capture market share.

What Is Content ID

Their solution was with Content ID. The system gave rights holders an automated way of finding unauthorised copies of videos and songs. Then, they were given the choice to block the content or run ads against it. Videos were checked at upload to detect copyright-protected material.

This can then be blocked by the copyright holder, muted in case of audio infringement, or restrict the video from playing on certain platforms. However, if the holder wishes to keep the content on the platform, they can choose to monetise it with ads and take the revenue from the views for themselves.

The system is treated as the first line of defence, and if the claim was made without basis, it can be refuted by the creator. YouTube revealed the success of the venture, as the algorithm generated $2 billion in revenue for rights holders. It also represents a proactive method of handling problems, handling 98% of copyright management on YouTube.

How Does It Work

As mentioned previously, the Content ID systems scan videos at upload to compare them against a reference library. This library contains over 50 million copyrighted works, provided by holders, which add up to a combined watch time of more than 500 years. This content comes from thousands of hand-selected partners, leaving small chinks in the armour of the copyright system.

It uses audio and video fingerprinting technology to detect matches between videos people upload and the reference files. Videos see a frame-by-frame analysis for fingerprinting, with identical images tackled by the use of a heat map visualisation that compares frame data from two videos side by side.

The system utilises a finite-state transducer algorithm for fingerprinting music. This allows it to detect changes such as beeps in the middle of songs, pitch, volume and speed changes, along with audio overlays and effects.

To power this large amount of compute, YouTube harnesses the power of Google’s Brain deep learning system. Content ID directly functions on this platform, providing multiple advantages. The deep learning framework also allows developers to make any changes that cause Content ID to fail, such as flipping video or changing aspect ratios.

The deep learning smarts of Google’s system allows for a much more organic and easy way to launch new iterations of the fingerprinting system. The neural network can be trained easily and much faster.

The Future Of Automated Copyrighting

Even as the music industry is largely moving away from a focus on copyright to more pressing matters such as artists receiving remuneration, YouTube’s move is one of the most important in the space. It not only ensured the future of the platform as we know it today but also set the standard for other platforms such as Twitter to engage in copyright protection for content on their platform. Content ID thus served its purpose as a stopgap into a world of more complex issues in the music industry while ensuring a fair outcome for everyone.

Access all our open Survey & Awards Nomination forms in one place >>

Anirudh VK

I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

How YouTube’s ML Algorithm Earned Billions For Music Producers By Building Fingerprints For Songs

Why Copyright Was The First Frontier

What Is Content ID

How Does It Work

The Future Of Automated Copyrighting

Anirudh VK

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.