MITB Banner

How To Establish Domain Transferability In Neural Models

Share

If a neural network say CNNs, are tasked with identifying the numbers, it is supposed to do this easily considering its reputation with image classification tasks.

The above image has digits in two styles. CNNs can achieve reasonably good accuracy (98%) when trained and evaluated on the source domain (SVHN). However, the same CNN model may perform poorly (67.1% accuracy) when evaluated on the target domain (MNIST).

This drop in performance generally comes from the distinct distributions between the two domains.

The images from the SVHN dataset contain various computer fonts, cluttered background from streets, and cropped digits near the image boundaries. Whereas the images from the MNIST dataset contain handwritten strokes and a clean background.

For improving the accuracy on the target dataset, we need to encounter with what is called a covariate shift problem.

What Is The Covariate Shift Problem?

If a  part of the target set (i.e., raw images without labels) has to be accessed and a domain adaptation has to be performed to transfer the underlying knowledge learned from the source to target, the same CNN model can obtain immediate performance boost from 67.1% to 98.9%. This task is called the covariate shift problem.

The existing methods like an adversarial learning-based method for domain adaptation at pixel-level would try to translate input images from one domain to the other, bringing the input distributions closer.

But without knowing the current state of the task-specific decision boundary, adversarial networks might continue the effort to perfect the road pixel synthesis and therefore optimize towards an ineffective direction.

So, it is important to preserve a notion of decision boundaries during distribution alignment.

To reduce the discrepancies within this adversarial training, the machine learning developers at Apple propose a metric based on the Wasserstein distance.

A New Metric: Sliced Wasserstein Discrepancy

Named after Russian mathematician Leonard Vaserstein, Wasserstein metric is a distance function which is used to compare the probability distributions of two variables.

In machine learning applications like image classification, probability distribution plays a major role in concluding whether a certain pixel value matches with the target which in turn decides the accuracy of the prediction.

Building on this, the team at Apple define the Sliced Wasserstein Discrepancy (SWD): a 1-D variational formulation of the Wasserstein distance between the outputs of the classifiers.

As shown in the figure above, SWD is designed to capture the dissimilarity of probability measures p1 and p2 between the task-specific classifiers C1 and C2, which take input from feature generator G. This provides geometrically meaningful guidance to detect target samples that are far from the source.

The whole process can be done in 3 steps:

  • Train G, C1, and C2 on a labeled source set to shape the decision boundaries.
  • Train C1 and C2 to maximize SWD on an unlabeled target set to detect target samples that are outside the reach of the source.
  • Train G to minimize the same SWD on an unlabeled target set to generate feature representations that are inside the support of the source.
Source: Apple Machine learning

When this metric is implemented on the previously discussed SVHN and the MNIST dataset, this  method generates much more discriminative feature representations compared to the model trained without adaptation as can be seen above

Future Direction

The team behind this work are hopeful that this method of unsupervised domain adaptation helps improve the performance of machine learning models in the presence of a domain shift. This method also enables training of models that are performant in diverse scenarios, by lowering the cost of data capture and annotation required to excel in areas where ground truth data is scarce or hard to collect; eventually enabling personalized machine learning by on-device adaptation of models for enhanced user experiences.

Know more about Slice Wasserstein Discrepancy here.

PS: The story was written using a keyboard.
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed