Last updated March 19, 2018

Understanding AVA– Image Discovery Tool Used By Netflix To Power Its Content Posters

Published on March 19, 2018
by Abhishek Sharma

Netflix, the popular online entertainment platform, is constantly transforming itself to provide an enriching viewer experience. With the number of Netflix users growing every day, viewership of TV shows has scaled up too, and it is introducing a collection of tools and algorithms to make content more relevant and enticing for the audience. The video-on-demand streaming company is not only looking forward to these tools to help choose the right title picture for a TV show, but also attract more viewership.

AVA, as these set of tools and algorithms are called, is helping Netflix users to watch shows of their choice and taste, available on the platform. It analyses large volumes of images obtained from video frames of a particular TV show to set a title image for that show. The title image makes it more visually appealing and easing the task of merchandising (user reach to the content) among the shows’ curators and creators.

What is AVA?

The presence of images on the internet space is ever growing. Thanks to advanced technology, the number of electronic gadgets related to photography and video recording are also growing large in number and are becoming more inexpensive. Therefore, the concern of handling/ organising image data– be it storage, processing or classification, is challenging. To address this concern, a research team from University of Barcelona, Spain in collaboration with Xerox corporation has developed a method called Aesthetic Visual Analysis (AVA) as a research project.

The project contains a vast database of over 2.5 lakh images combined with metadata such as aesthetic scores for images, semantic labels for more than 60 classifications of images and many other characteristics. They primarily use statistical concepts such as the standard deviation, mean score and variance to rate the images. Based on the distributions computed from these statistics, they assess the semantic challenges and choose the right images for the database. The methodology presented in the paper also discusses the training of appropriate images that fits the statistical criteria followed for the project, such as goodness-of-fit with root-mean-square errors (RMSE) and so on. The classification are also done according to various styles of customisation.

With this they primarily alleviate the problem of extensive benchmarking and also train more images. They also show how computer applications could be made visually richer with large datasets and better aesthetic appeal. Altogether, the project consists of algorithms along with statistical analysis. With AVA, computing performance can be significantly optimised and have lesser impact on the hardware.

How AVA works

In the usual scenario, content editors have to go through a vast assortment of video frames for a show, to select a good title image. The number of frames span in millions depending on the number of episodes in a show. This task of manually screening the frames is almost impossible and are often rendered ineffective. This is where AVA comes into play, where it uses its image classification algorithms for sorting the right image at the right time.

AVA follows a sequential method by analysing images obtained through the process of frame annotation. The variables required for the algorithm are annotated in the video frames, which is achieved by using their own framework called Archer. The framework is closely based on FFmpeg platform for video editing. Archer splits the video into very tiny bits to aid parallel video processing. This will lead to more useful algorithms being generated on AVA.

After the frames are obtained, they are subjected to a series of image recognition algorithms to build metadata which are the classified as visual, contextual and composition metadata. Some of the important details captured in the metadata are given below.

Visual Metadata: For brightness, sharpness and color
Contextual Metadata: For facial expressions, camera motion, camera angle and object detection
Composition Metadata: For intricate image details such as depth of field and symmetry.

The annotations are made based on the above metadata. The ‘right’ picture is selected by now feeding all the relevant, appropriate images on an automating image processing framework.

And, finally it chooses the right picture

The ‘best’ image is chosen from considering three important aspects– the lead actors, visual range and image filters. The emphasis is given first on lead actors of the show since they form an aesthetic appeal and make a visual impact.

The next thing, is the diversity of the images present in the video frames such as the camera positions, image details such as brightness, color, contrast to name a few. With these in mind, image frames are easy to group based on similarities. This will help develop image support vectors. The vectors primarily will assist in designing an image diversity index where all the relevant images collected for an episode or even a movie can be scored based on visual appeal.

Apart from these factors, other sensitive factors such as violence, nudity and advertisements are also filtered, and are allotted low priority in the image vectors. This way they are screened out completely in the process.

This article provides a brief outlook towards the implementation of AVA in Netflix. However, there might be other contributing factors in computer vision to sort images and videos. The underlying software and applications is just a tip of the iceberg. In the coming days, Netflix will surely come up with much more beautiful and richer interface to attract more viewers.

Access all our open Survey & Awards Nomination forms in one place >>

Abhishek Sharma

I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.

Watch More

Understanding AVA– Image Discovery Tool Used By Netflix To Power Its Content Posters

What is AVA?

How AVA works

And, finally it chooses the right picture

Abhishek Sharma

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.