The biggest cloud provider Amazon launched a machine learning inference chip called AWS Inferentia at the AWS re:Invent conference last week. It has been designed by Annapurna Labs an Amazon-owned Israeli company which is at the forefront of building next-generation semiconductor systems. The industry cloud leader has claimed that the chip is geared at larger workloads which consume entire GPUs or require lower latency.
According to the news statement, the chip provides hundreds of teraflops per chip and thousands of teraflops per Amazon EC2 instance for multiple frameworks, including TensorFlow, Apache MXNet and PyTorch, and multiple data types, including INT-8 and mixed precision FP-16 and bfloat16.
The release of the chip is not a direct competition to NVIDIA, Intel or AMD, since Amazon will not sell the chips commercially, but only to their cloud customers. However, the release is an indication of Amazon’s intention of deepening its hardware muscle and their take on Google Cloud which released a third generation TPU.
Experts are wondering why Amazon has built chips which are designed for specific workloads when general-purpose processors can support a wider class of workloads. Hardware specialisation can help enterprises provide better services to their customers at a lower latency. It is also believed that the move will also help Amazon attract more developers to its AWS platform by broadening its range of tools and services.
The chip is aimed at inference — the part where the work actually gets done. While training has been receiving a lot of attention and has been pegged as a big part of the machine learning process, inference is what powers the core machine learning services — like object recognition in videos, recognising speech in speech recognition and text in text recognition. Inference drives core AWS ML services like Translate, Poly and Lex.
Why Amazon Wants To Dive Deep In Hardware Specialisation
ML Chip Can Tackle Specialised Workloads: The reason why Amazon wants to deepen hardware specialisation on the server side of computing is because ML workload usually requires more server resources than all the present forms of server computing combined. Of late, there has been an exponential rise in different kinds of workloads, for example, autonomous driving which requires more resources than other workloads.
Cloud Providers Look For Big Gains From Specialised Processors: Amazon engineer James Hamilton emphasised on the fact that machine learning applications are finding plenty of uses in practically every business — finance, manufacturing, healthcare, insurance and even heating and cooling — which means businesses are ready to bet big on ML.
Lower The Costs For Machine Learning: Leading cloud providers, especially Google, Microsoft and Alibaba, are focused on making it easy for customers to deploy and scale ML and drive down costs for ML workloads. Given how ML has immediate applicability in about every domain, cloud companies look for massive gains from specialised hardware which can support a wider range of workloads.
Lower Latency, Lower Power Consumption, Less Cost: A large number of enterprises and startups look for hardware which is optimised for a specific workload. AWS Inferentia will reportedly deliver high throughput and a low latency inference performance at an extremely low cost.
Market analysts cite that more machine learning applications are being run on AWS as compared to other cloud computing platforms. According to a blog, AWS has over 10,000 active machine learning developers and a sizable number of cloud-hosted TensorFlow workload. Both the Inferentia hardware and software meets a wide number of inference use cases and also supports ONNX which is the industry’s most commonly used exchange format for the neural network. It also interfaces natively with all the most commonly used and popular frameworks such as MxNet, PyTorch, and TensorFlow, providing customers with choices.
While the move has been widely positioned as AWS catching up with its cloud competitors in the AI chips market, we believe hardware specialisation will play a crucial role in winning businesses and driving big gains. By adding the new machine learning inference chip on the ML platform in the cloud, AWS will substantially reduce the cost of deploying ML inference at scale.