MITB Banner

Watch More

Artificial Intelligence Gets A Boost With The Latest Generation Intel® XEON® Scalable Processors That Drive Inference At Scale

It is an industry-wide phenomenon that has opened the best opportunities for organisations across the board. Artificial Intelligence has become the growth story for India’s digital natives like Flipkart, Swiggy and Ola. Today, many AI applications — facial recognition, product recommendations, virtual assistants have become rooted in our day-to-day lives. However, these emerging AI applications have one common feature ruling them all — dependence on hardware has become the core enabler of innovation. In fact, many rising consumer digital companies are dependent on next-gen architecture that can significantly increase computational efficiency and speed the time-to-market.

According to an IDC report, spending on AI systems will more than double to $79.2 billion1 in 2022 with a compound annual growth rate (CAGR) of 38.0% over 2018-2022 forecast period. Hardware spending, dominated by servers, is projected to touch $12.72 billion this year as companies aggressively invest in building up the infrastructure necessary to support AI systems.

This has necessitated a shift towards an ‘AI technology stack’ that abstracts away the complexity of the hardware layer related to storage, memory and logic and allows higher performance gains for developers and data scientists. What we are seeing now is a new value-creation in the market by leading semiconductor companies that are focusing on end-to-end solutions for industries.

The writing on the wall is clear — the mega breakthroughs in IT aren’t going to come from hardware alone, but from the intersection of AI, hardware and software. AI hardware solutions can only deliver maximum gains if they are compatible with other layers of the software environment. To serve their customers better, semiconductor companies are developing a common programming framework and ecosystem that is in concert with hardware.

Top Takeaways

1. No standard AI chip: The AI market is vast, but there is no “one-size-fits-all” approach. Thus, there can be no “standard” AI chip

2. Need to abstract away hardware complexity: Data Scientists and application developers look for high-performance hardware that can churn out general purpose AI solutions within a certain amount of time and power budget. They also demand increased flexibility with hardware that allows them to program with mainstream languages at a higher abstraction level along with libraries. The data science community is looking for a complete solution stack that abstracts away the hardware specifics, allowing them the ease to crunch parallel workloads more efficiently.

3. Shift to inference at scale: Inference at scale marks deep learning’s coming of age. By 2020, the ratio of training deep learning models to inference within enterprises will rapidly shift to 1:53, as compared to the 1:1 split we see today. In fact, Deloitte research predicts that by 2023, 43 percent4 of all AI inference will occur at the edge. Inference is important because it allows enterprises to monetize AI by launching new applications or products by applying their trained models to new datasets. In fact, analysts forecast that inference will be the biggest driver and is projected to generate more revenue in data centers than at the edge.

To gain a clear picture, let’s take a look at what AI inferencing involves and why inference is where the real value lies for enterprises. A case in point is speech recognition, wherein audio recordings must be processed using Reinforcement Learning to train speech recognition models. However, once a speech recognition model is trained, it can be used for a wide array of applications such as speech-to-text and voice responses for smart speakers.

4. Rethinking data center AI infrastructure: GPUs widely known for parallel processing power are geared towards training where one inputs the data into the model. Training is also carried out in a centralized location, while inference is pushed out to the edge of the network. This requires enterprises to rethink their infrastructure strategy around training as well inference.

5. Scale-up/scale-down approach: Organizations are increasingly leaning towards a scale-up/scale-down approach where it is easy to scale up CPU clusters/processors to enable efficient power consumption without sacrificing performance, while minimizing the need for major redesign.

Intel Drives Innovation with Multi-purpose to Built AI Compute from Device to Cloud

In this article, we take a look at Intel’s AI strategy and how the chip giant has built winning roadmap for the fast-growing AI market. Further, it has created a software strategy with tools such as BigDL, nGraph and VNNI which enables developers to make maximum gains from its hardware portfolio.

Run the AI You Need on the CPU You Know, With 2nd Gen Intel® Xeon® Scalable Processors

Intel has built a new generation of hardware and software that allows enterprises to enter an era of pervasive intelligence and also address specific customer needs. Built with a data-centric focus, 2nd Gen Intel® Xeon® Scalable processors improve performance up to 277X for inference5 compared to the processor’s initial launch in July 2017.

With the growing buzz around AI/ML, 2nd Gen Intel® Xeon® Scalable processors promise an AI acceleration push, coupled with Intel® DL Boost – tailored for deep learning inferencing. With 2nd Gen Intel® Xeon® Scalable processors, Intel® DL Boost provides a winning combination without relying on GPUs.  AI capabilities can be more easily integrated alongside other workloads on multi-purpose 2nd Gen Intel Xeon Scalable processors. Furthermore, VNNI6 can be thought of as an AI inference acceleration are integrated into every 2nd Gen Intel Xeon Scalable processor. Further, performance can significantly improve for both batch inference and real-time inference, because VNNI reduces both the number and complexity of convolution operations required for AI inference, which also reduces the compute power and memory accesses these operations require.

Why is Intel® DL Boost pegged as a breakthrough?

The answer is straightforward. Most commercial deep learning applications today use 32-bits of floating point precision (fp32) for training and inference workloads. However, both deep learning training and inference can be performed with lower numerical precision, using 16-bit multipliers for training and 8-bit multipliers for inference with minimal to no loss in accuracy.

With Intel® DL Boost, Intel has created a new X86 instruction that can perform an integer (INT) 8 (bit precision) matrix multiplication and summation with fewer cycles than before. Intel® DL Boost crunches 3 instructions into 17 and can speed up dense computations characteristic of convolutional neural networks (CNNs) and deep neural networks (DNNs). The main advantage for developers and data scientists is that when it comes to AI inferencing for trained neural networks that don’t require periodic retraining, one no longer needs to rely on special purpose compute hardware like GPUs or TPUs.

Here’s a benchmarking report from Dell* that emphasizes how the latest generation processor performs faster in parallel workloads, including inferencing. During benchmark testing, Dell engineers realised more than 3X8 faster inferencing for image recognition with INT8, ResNet50.

In Conclusion

What’s evident is the shift towards a general-purpose AI stack that enterprises can deploy for Deep Learning. To that end, AI computing companies need to provide a full-stack solution across silicon, tools and libraries for easier application development. Meanwhile, the developer ecosystem demands SDKs and compilers to optimise and accelerate AI algorithms.

There’s a need to bring more AI capabilities to enterprises and empower developer ecosystem. We believe Intel is playing a pivotal role in the emerging AI market with its end-to-end solutions. Intel has a wide array of AI hardware that includes CPUs, accelerators/ purpose-built hardware, FPGAs and in the future neuromorphic chips. Developers look for a software environment that can function across different platforms without them having to overhaul their systems.

On the software side, Intel is winning the market with tools like Intel® Distribution of OpenVINO™ Toolkit – that accelerate deep neural network workloads and optimizes deep learning solutions across various hardware platforms. In addition to this, support for popular Deep Learning frameworks like TensorFlow* and MXNet*, machine learning libraries like PyTorch* and existing compilers like ONNX* will help the chip giant win developer mindshare.

For more on Intel AI, click here


Product & Performance Information

1 Source: IDC,  Worldwide Spending on Artificial Intelligence Systems Will Grow to Nearly $35.8 Billion in 2019, According to New IDC Spending Guide, March 11, 2019. https://www.idc.com/getdoc.jsp?containerId=prUS44911419

2 Source:  IDC,  Worldwide Spending on Artificial Intelligence Systems Will Grow to Nearly $35.8 Billion in 2019, According to New IDC Spending Guide, March 11, 2019. https://www.idc.com/getdoc.jsp?containerId=prUS44911419

3 Source: As deep learning has been adopted more broadly, there has been a clear shift in the ratio between cycles of training (producing models) and inference (applying models) from 1:1 in early days of DL, to potentially well over 1:5 by 2020. Deep Learning Is Coming Of Age, December 10, 2018. https://www.intel.ai/deep-learning-is-coming-of-age/#gs.jnunhy

4 Source: One research study predicts that 43 percent of all AI inference (or analysis) globally will occur at the edge—meaning outside of data centers, on machines and devices—by 2023, up from just 6 percent last year. Pervasive Intelligence, November, 2018 https://www2.deloitte.com/insights/us/en/focus/signals-for-strategists/pervasive-intelligence-smart-machines.html

5 Source: Performance results are based on testing as of 06/15/2015 (v3 baseline), 05/29/2018 (241x) & 6/07/2018(277x) and may not reflect all publically available security updates. See configuration disclosure for details. No product can be absolutely secure.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks.

Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.

Deep Learning Is Coming Of Age, December 10, 2018. https://www.intel.ai/deep-learning-is-coming-of-age/#gs.jnunhy

6 Source: Vector Neural Network Instructions Enable Int8 AI Inference on Intel Architecture, https://www.intel.ai/vnni-enables-inference/#gs.jnzd8e

7 Source: Intel will deliver enhancements to Intel® Deep Learning Boost, beginning with a new set of embedded accelerators called Vector Neural Network Instructions (VNNI), which accomplish in a single instruction what formerly required three, to speed up dense computations characteristic of convolutional neural networks (CNNs) and deep neural networks (DNNs). Monetizing AI: How To Get Ready For ‘Inference At Scale’, April 2019 https://itpeernetwork.intel.com/ai-inference-at-scale/#gs.jo1x8y

8 Source: Even when compared to  Intel® Xeon® Scalable processor, (Code named Skylake), the 2nd Generation Intel® Xeon® Scalable shines — which is something that we at Dell EMC have confirmed in the HPC and AI Innovation Lab. In benchmark testing, our engineers have realized more than 3x faster inferencing for image recognition with INT8, ResNet50. These tests compare the performance of a 2nd Generation Intel® Xeon® Scalable Gold Processor 6248 and an Intel® Xeon® Scalable Gold Processor 6148 (Skylake) on an inference benchmark for image classification, as summarized in this slide. For more complete information visit: https://www.intel.com/content/www/us/en/benchmarks/benchmark.html. Deep Learning Gets A New Boost With New Intel Processor, April 5, 2019. https://blog.dellemc.com/en-us/deep-learning-gets-boost-new-intel-processor/

Intel, the Intel are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.




Access all our open Survey & Awards Nomination forms in one place >>

Picture of Richa Bhatia

Richa Bhatia

Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories