Last updated June 18, 2018

What Is Zero-Shot Learning?

Published on June 18, 2018
by Smita Sinha

Over the last few decades machines have become much smarter but without a properly labelled training data set of seen classes, it cannot distinguish between two similar objects. On the other hand, humans are capable of identifying approximately 30,000 basic object categories. In machine learning, this is considered as the problem of zero-shot learning (ZSL). Let us consider an example, a child would have no problem recognising a zebra if it has seen a horse before and read somewhere that a zebra looks similar to a horse, but has black-and-white stripes.

In case of machines, the ZSL recognition relies on the existence of a labelled training set of seen classes and the knowledge about how each unseen class is semantically related to the seen classes.

According to a research paper, the reason why humans can perform ZSL is because of their existing language knowledge base, which provides a high-level description of a new or unseen class (zebra) and makes a connection between it and seen classes and visual concepts (horse, stripes). Inspired by this humans ability, there is an increasing interest in machine ZSL for scaling up visual recognition.

Zero-Shot Learning 101

A study explains that zero-shot machine learning is used to construct recognition models for unseen target classes that have not labelled for training. It utilises the class attributes as aside information and transfers information from source classes with labelled samples. ZSL is done in two stages:

Training: Where the knowledge about the attributes is captured
Inference: The knowledge is then used to categories instances among a new set of classes.

Recently, there has been a surge in interest in automatic recognition of attributes, due to the availability of data containing meta information. A research paper claims that this has proved to be particularly useful for recognising images.

Zero-shot learning approaches are designed to learn intermediate semantic layer, their attributes, and apply them at inference time to predict a new class of data, claims a study.

Li Zangs’ study further explains, zero-shot learning also relies on the existence of a labelled training set of seen classes and unseen class. Both seen and unseen classes are related in a high dimensional vector space, called semantic space, where the knowledge from seen classes can be transferred to unseen classes.

With the semantic space and a visual feature representation of image content, Li Zang and a group of researchers solved ZSL in two steps:

A joint embedding space is learned where both the semantic vectors (prototypes) and the visual feature vectors can be projected to.
Nearest neighbour (NN) search is performed in this embedding space to match the projection of an image feature vector against that of an unseen class prototype.

Implementing Zero-Shot Learning

In order to make ZSL effective, the key features (images and text) are categorised as vectors. This means sourcing the specific vectors beforehand for the project. Once collected, they are provided with a description which enables the algorithms to classify them accordingly. The training is done with respect to these vectors which leads to classification according to separate classes. The testing phase recognises new inputs and again leads to newer classes, regardless of the train data.

Steps Involved In Implementation:

In a tutorial, Timothy Hospedales described three steps to implement zero-shot learning in a model:

Obtain category vector, V, through:

Attributes: It describes the visual appearance of the concept or instance by assigning labelled visual properties to it and they can be easily transferred from seen to unseen classes.
Word vectors: It is a straightforward application to other data types such as video, text and audio, among others.

Train:

Give some know class category vectors V and images X
Learn images by categorising them as vector classifiers or regressors V=F(X)

Test:

Specify vector V for a new class to recognise
Map test data F(X) to category vector space
NN matching of V vs F(X)

Deep Zero-shot learning

In older times, ZSL works used hand-crafted feature representations for objects. They have been replaced by features extracted from deep convolutional neural networks (CNN) in the past two years for visual feature representation. Here, the features are extracted with pre-trained CNN models. The deep CNNs are also used as inputs to their embedding model. Existing DNN-based ZSL works differ on whether they use the semantic space or an intermediate space as the embedding space.

How To Train An End-To-End ZSL In Deep Models

Train E (X,Z) to be large for machine pairing, small for mismatched pairs.
Let E (X,Z) be a deep network rather than bilinear model
Concatenate (X,Z) and feed into a deep network
It is better to do some representation learning on X and Z, then the inner product.

A Simple Deep Network For ZSL As Explained In Timothy’s Tutorial

Train a max-margin ranker. Or Y=(1,0) for (matching, mismatching pair) pairs.

According to a study, despite the success of deep neural networks which learn an end-to-end model between text and images in other vision problems such as image captioning, very few deep ZSL models exist. The deep ZSL models show little advantage over ZSL models which utilise deep feature representation but do not learn an end-to-end embedding.

Access all our open Survey & Awards Nomination forms in one place >>

Smita Sinha

I have over three-years of experience in editing, reporting. My career in journalism began with The Economic Times. When I am not busy, I read, I binge-watch web series.

Watch More

What Is Zero-Shot Learning?

Zero-Shot Learning 101

Implementing Zero-Shot Learning

Steps Involved In Implementation:

Deep Zero-shot learning

How To Train An End-To-End ZSL In Deep Models

Smita Sinha

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.