Last updated December 17, 2019

How To Establish Domain Transferability In Neural Models

Published on June 20, 2019

by Ram Sagar

If a neural network say CNNs, are tasked with identifying the numbers, it is supposed to do this easily considering its reputation with image classification tasks.

The above image has digits in two styles. CNNs can achieve reasonably good accuracy (98%) when trained and evaluated on the source domain (SVHN). However, the same CNN model may perform poorly (67.1% accuracy) when evaluated on the target domain (MNIST).

This drop in performance generally comes from the distinct distributions between the two domains.

The images from the SVHN dataset contain various computer fonts, cluttered background from streets, and cropped digits near the image boundaries. Whereas the images from the MNIST dataset contain handwritten strokes and a clean background.

For improving the accuracy on the target dataset, we need to encounter with what is called a covariate shift problem.

What Is The Covariate Shift Problem?

If a part of the target set (i.e., raw images without labels) has to be accessed and a domain adaptation has to be performed to transfer the underlying knowledge learned from the source to target, the same CNN model can obtain immediate performance boost from 67.1% to 98.9%. This task is called the covariate shift problem.

The existing methods like an adversarial learning-based method for domain adaptation at pixel-level would try to translate input images from one domain to the other, bringing the input distributions closer.

But without knowing the current state of the task-specific decision boundary, adversarial networks might continue the effort to perfect the road pixel synthesis and therefore optimize towards an ineffective direction.

So, it is important to preserve a notion of decision boundaries during distribution alignment.

To reduce the discrepancies within this adversarial training, the machine learning developers at Apple propose a metric based on the Wasserstein distance.

A New Metric: Sliced Wasserstein Discrepancy

Named after Russian mathematician Leonard Vaserstein, Wasserstein metric is a distance function which is used to compare the probability distributions of two variables.

In machine learning applications like image classification, probability distribution plays a major role in concluding whether a certain pixel value matches with the target which in turn decides the accuracy of the prediction.

Building on this, the team at Apple define the Sliced Wasserstein Discrepancy (SWD): a 1-D variational formulation of the Wasserstein distance between the outputs of the classifiers.

As shown in the figure above, SWD is designed to capture the dissimilarity of probability measures p1 and p2 between the task-specific classifiers C1 and C2, which take input from feature generator G. This provides geometrically meaningful guidance to detect target samples that are far from the source.

The whole process can be done in 3 steps:

Train G, C1, and C2 on a labeled source set to shape the decision boundaries.
Train C1 and C2 to maximize SWD on an unlabeled target set to detect target samples that are outside the reach of the source.
Train G to minimize the same SWD on an unlabeled target set to generate feature representations that are inside the support of the source.

When this metric is implemented on the previously discussed SVHN and the MNIST dataset, this method generates much more discriminative feature representations compared to the model trained without adaptation as can be seen above

Future Direction

The team behind this work are hopeful that this method of unsupervised domain adaptation helps improve the performance of machine learning models in the presence of a domain shift. This method also enables training of models that are performant in diverse scenarios, by lowering the cost of data capture and annotation required to excel in areas where ground truth data is scarce or hard to collect; eventually enabling personalized machine learning by on-device adaptation of models for enhanced user experiences.

Know more about Slice Wasserstein Discrepancy here.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

Is it Humane to Bash Humane Ai Pin?

Meet Ferret-UI, Apple’s AI-Powered Answer to Mobile UI Challenges

Apple’s ReALM Challenges OpenAI’s GPT-4

What to Expect at the ‘Absolutely Incredible’ Apple WWDC 2024

Apple Appears to Have Achieved AGI

Apple Finally Unveils MM1, a Multimodal Model for Text and Image Data

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Nothing Becomes the First Smartphone Company to Integrate OpenAI’s ChatGPT

Siddharth Jindal

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the