Last updated January 23, 2019
In AI Origins & Evolution

How Google’s Gpipe Is Using Pipeline Parallelism For Training Neural Networks

Published on December 11, 2018
by Ram Sagar

Training bigger neural networks can be challenging when faced with accelerator memory limits. The size of the datasets being used by machine learning models is very large nowadays. For example, a standard image classification datasets like hashtagged Instagram contains millions of images. With increasing quality of the images, the memory required will also increase. Today, the memory available on NVIDIA GPUs is only 32 GB.

Therefore, there needs to be a tradeoff between memory allocated for the features in a model and how the network gets activated. It is only understandable why the accelerator memory limit needs to be breached.

A deep neural network benefits from larger datasets as it alleviates the problem of overfitting. And, to run these ever growing networks, we need deep learning supercomputers such as Google TPU or NVIDIA’s DGX which enable parallelism by providing faster interconnections between the accelerators.

Today, an average ImageNet resolution is 469 x 387 and it has been proven that by increasing the size of an input image, the final accuracy score of a classifier increases. To fit the current accelerator memory limits, most models are made to process images of sizes 299 x 299 or 331 x 331.

Meet Gpipe

In this paper, the researchers at Google Brain, propose pipeline parallelism to scale up deep neural networks training. And, as a result, they introduce a new machine learning library called GPipe.

GPipe can be used to parse a model across different accelerators and to automatically split a mini-batch of training examples into micro-batches. Pipelining allows the accelerators to function with parallelism.

The memory required to update the weights during backpropagation can be reduced with GPipe as it automatically calculates the forward activations during backpropagation. Hence enabling the users to use more accelerators for training larger models and achieving performances to scale without filtering hyperparameters.

Researchers at Google Brain say, “GPipe can support models up to 25 times larger using 8 accelerators without reducing the batch size. The implementation of GPipe is very efficient: with 4 times more accelerators we can achieve a 3.5 times speedup for training giant neural networks.”

So, to test and demonstrate the GPipe’s functionality, the researchers have used ImageNet ILSRVC 2012 dataset where they use up 557 million parameters with an input image size of 480 x 480. And, this scaled up AmoebaNet model attains validation accuracy of 84.3 % top-1 outperforming all other models trained from scratch on ImageNet dataset.

The 2014 ImageNet challenge has seen accuracy scores of 74.8% with 4 million parameters. And, in 2017 the accuracy has risen to 82.7% while using up 145.8 million parameters which is 36 times the number of parameters used previously.

The researchers have also managed to push the CIFAR-10 accuracy to 99%. The CIFAR-10 dataset contains 60,000 32 x 32 color images in 10 different classes. The 10 different classes represent aeroplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Design Features Of GPipe

The core algorithm has been implemented using TensorFlow library. By invoking a GPipe library, the user specifies a sequential list of L layers. Where each layer specifies model parameters, stateless forward computation function and an optional cost estimation function.

After the layer specifications have been defined, GPipe partitions the network into K composite layers and places k-th composite layer onto k-th accelerator. The number of partitions, ‘K’ is user-defined and During training, GPipe first divides a mini-batch of size N into T micro-batches at the first layer. Each micro-batch contains N/T examples.

Each accelerator only stores output activations at the partition boundaries, rather than activations of all intermediate layers within the partition. The accelerator recomputes the composite forward function and requires only the cached activations at partition boundaries; reducing the overall memory allocation.

The gradients for each micro-batch are computed based on the same model parameters as the forward pass. At the end of each mini-batch, the model parameters are updated across accelerators by applying gradients. So, GPipe, in a way resonates with the nature of gradient descent independent of number of partitions.

To scale up the models, RMSProp optimizer with a decay of 0.9 and label smoothing coefficient equal to 0.1have been used. The learning rate is scheduled to decay after 3 epochs at a rate of 0.97 with an initial learning rate of 0.00125 times the batch size. This scaled up giant model reached 84.3% top-1 accuracy with single-crop.

What Do Results Say

With GPipe, it is possible to:

Support models up to 25 times using 8 accelerators due to recomputation and model parallelism.
Achieve up to 3.5 times speedup with four times more accelerators using pipelining in our experiments.
Train consistently regardless of the number of partitions due to synchronous gradient descent.
Free researchers from the time consuming process of re-tuning hyperparameters. So, GPipe can be combined with data parallelism to scale neural network training using more accelerators.
Advance the performance of visual recognition tasks on multiple datasets, including pushing ImageNet top-5 accuracy to 97.0%, CIFAR-10 accuracy to 99.0%, and CIFAR-100 accuracy to 91.3%.
The training efficiency of GPipe can be further improved by better graph partition algorithms.

Access all our open Survey & Awards Nomination forms in one place >>

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

How Google’s Gpipe Is Using Pipeline Parallelism For Training Neural Networks

Meet Gpipe

Design Features Of GPipe

What Do Results Say

Ram Sagar

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru