MITB Banner

Inference Chips Are The Next Big Battlefield For Nvidia And Intel

Share

Image for representative purposes only

The recent years have been marked by chip manufacturers entering the AI and Machine Learning fields. This is due to the market being pegged as one of the biggest since the PC market flourished in the early 2000s. Many prominent companies, including Nvidia and Intel, have already begun dipping their toes into the potential big fish, selling to enterprises and big companies looking to get into AI.

Until now, Nvidia has held the lead without question. After sinking a sizeable amount of R&D into researching training deep learning models, Nvidia has begun producing hardware that can be used specifically for training tasks. Previously, this task was done using their Quadro and Titan enterprise chips, with the mantle now being passed on to the TensorRT architecture in their latest generation of consumer GPUs.

However, Nvidia is not the only player in the field. In fact, Intel has been playing catch up with Nvidia for a while now, by acquiring a deep learning-based startup known as Nervana in 2016. Since then, it has been rumoured that they have been working on newer chips in order to capitalize on the ever-growing need for inference hardware. That rumour was clarified this year at CES, where Intel announced a dedicated chip for inference.

Now, the stage has been set for the next big showdown in computing between Intel and Nvidia. While they are fighting on two different sides of the same spectrum, future advancements in the state of AI might give the edge to one of them.

Learning vs. Inference: How Does Specialization Matter?

Utilising an AI model for any real-world application requires these procedures to be carried out first. Training is just what it sounds like, where the model is trained on a sample of the data it  will eventually process. Using this, it can ‘learn’ the characteristics of the required data and store it in its ‘memory’. Post this, the model itself is stripped down so it can run without consuming too much computing power, and is then used for the purpose it was developed for.

Even as the model is stripped down, it still requires a considerable amount of compute to run itself. When the model then predicts results based on its training, this process is called inference. The fight now is between providing compute for both inference and learning, which are varied workloads and work well on specialized devices.

Learning tasks are more suited to parallelization, as they involve running multiple, varied operations at the same time. Nvidia has a natural advantage here, as its GPUs can be used to accelerate learning by distributing processing across CUDA cores. Moreover, they have also invested heavily into the compute unified device architecture, or CUDA, as a programming architecture. Due to Nvidia’s efforts, CUDA is not only integrated into multiple workspaces such as TensorFlow and Hadoop, but also used by cloud service providers such as Microsoft and Amazon.

Intel, on the other hand, has largely specialized in CPUs that offer unmatched performance for sequential workloads. Inference is a highly sequential workload, making it compute heavy on many devices. However, inference workloads largely take place on the ‘edge’, which is to say that they are not processed on servers. They are usually conducted on mobile devices, which typically don’t have the compute required to perform heavier inference workloads.

Some manufacturers have begun to get past this by manufacturing their own inference chips, such as Apple, Huawei and Samsung. However, experts predict that the next step in inference on devices will come from FPGAs and mobile ASICs.

The Transition On The Inside Of AI

FPGAs and ASIC are two of the biggest developing markets for AI learning and inference. While ASICs can be used for heavier inference and learning workloads, as seen by Google’s creation of the TPU, FPGAs can be used to provide these features on mobile devices. Strides are already being made towards this end by both giants.

Intel released Nervana, an ASIC for inference and support for large amount of parallelization in server settings. It has also revamped the chip structure considerably, and built them on a 10nm manufacturing process. This is likely to be targeted at enterprise users, as seen by partnering with Facebook as a developer partner to ensure easy use of Nervana.

NVIDIA has also begun to dip its toes in the inference pool, with the release of its deep learning inference platform with a heavy focus on TensorRT. Reportedly, TensorRT delivers a 40x higher throughput in real-time latency. They have also mentioned that 1 of Nvidia’s GPU-accelerated servers is as powerful as 11 CPU-based ones.

The battlefield has been set for the next generation of compute power. The giants will duke it out, with the consumer being the biggest winner due to innovation driven by competition.

Share
Picture of Anirudh VK

Anirudh VK

I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.