Recently, a group of researchers from Google released a neural network architecture MobileNetV2, which is optimised for mobile devices. The architecture delivers high accuracy results while keeping the parameters and mathematical operations as low as possible to bring deep neural networks to mobile devices.
Last year, the company introduced MobileNetV1 for Tensorflow, designed to support classification, detection, embedding and segmentation. “The ability to run deep networks on personal mobile devices improves user experience, offering anytime, anywhere access, with additional benefits for security, privacy, and energy consumption. As new applications emerge allowing users to interact with the real world in real time, so does the need for ever more efficient neural networks,” Google researchers Mark Sandler and Andrew Howard said in their research blog post.
The new mobile architecture, MobileNetV2 is the improved version of MobileNetV1 and is released as a part of TensorFlow-Slim Image Classification Library. Developers can even access it in Colaboratory or can download the notebook and explore it using Jupyter. It is also available as modules on TensorFlow-Hub. The pretrained checkpoints can be found on the open source platform GitHub.
What Is MobileNetV2?
MobileNets are small, low-latency, low-power models parameterised to meet the resource constraints of a variety of use cases. According to the research paper, MobileNetV2 improves the state-of-the-art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. It is a very effective feature extractor for object detection and segmentation. For instance, for detection, when paired with Single Shot Detector Lite, MobileNetV2 is about 35 percent faster with the same accuracy than MobileNetV1.
It builds upon the ideas from MobileNetV1, using depth-wise separable convolutions as efficient building blocks. However, Google says that the 2nd version of MobileNet has two new features:
- Linear bottlenecks between the layers: Experimental evidence suggests that using linear layers is crucial as it prevents nonlinearities from destroying too much information. Using non-linear layers in bottlenecks indeed hurts the performance by several percent, further validating our hypothesis
- Shortcut connections between the bottlenecks
The Basic Structure of MobileNetV2
The bottlenecks of the MobileNetV2 encode the intermediate inputs and outputs while the inner layer encapsulates the model’s ability to transform from lower-level concepts such as pixels to higher level descriptors such as image categories. With traditional residual connections, shortcuts enable faster training and better accuracy.
The basic building block is a bottleneck depth-separable convolution with residuals. The architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. The researchers have tailored the architecture to different performance points, by using the input image resolution and width multiplier as tunable hyperparameters, that can be adjusted depending on desired accuracy or performance trade-offs. The primary network (width multiplier 1, 224 × 224), has a computational cost of 300 million multiply-adds and uses 3.4 million parameters. The network computational cost ranges from 7 multiply-adds to 585M MAdds, while the model size varies between 1.7M and 6.9M parameters.
How Is It Different From MobileNetV1?
The MobileNetV2 models are much faster in comparison to MobileNetV1. It uses 2 times fewer operations, has higher accuracy, needs 30 percent fewer parameters and is about 30-40 percent faster on a Google pixel phone.
To enable on-device semantic segmentation, the researcher used MobileNetV2 as a feature extractor in a reduced form of DeepLabv3 that controls the resolution of computed feature maps. On the semantic segmentation benchmark, PASCAL VOC 2012, MobileNetV2 performed similar to MobileNetV1 as feature extractor, but the V2 version requires 5.3 times fewer parameters and 5.2 times fewer operations in terms of multiply-adds.
On A Concluding Note
The new version of MobileNet has several properties that make it suitable for mobile applications and allows very memory-efficient inference and utilises standard operations present in all neural frameworks. For the ImageNet dataset, MobileNetV2 improves the state of the art for a wide range of performance points. For object detection task, it outperforms real-time detectors on COCO datasets. MobileNetV2 provides a very efficient mobile-oriented model that can be used as a base for many visual recognition tasks, claims Google.