Fueling the growth of artificial intelligence-based services worldwide, NVIDIA launched an AI data centre platform that delivers the industry’s most advanced inference acceleration for voice, video, image and recommendation services. The tech giant debuted the Tesla Turing 4 graphics processing unit (GPU) chip to speed up inference from deep learning systems in data centres.
According to the official statement released by NVIDIA, the Tesla T4 GPU provides breakthrough performance with flexible, multi-precision capabilities, from FP32 to FP16 to INT8, as well as INT4. Packaged in an energy-efficient, 75-watt, small PCIe form factor that easily fits into most servers, it offers 65 teraflops of peak performance for FP16, 130 TOPS for INT8 and 260 TOPS for INT4.
To optimise the data centre for maximum throughput and server utilisation, the NVIDIA TensorRT Hyperscale Platform includes both real-time inference software and Tesla T4 GPUs, which process queries up to 40x faster than CPUs alone.
“We’re racing toward the future where every customer interaction, every product, and every service offering will be touched and improved by AI. Realising that the future requires a computing platform that can accelerate the full diversity of modern AI, enabling businesses to create new customer experiences, reimagine how they meet — and exceed — customer demands, and cost-effectively scale their AI-based products and services,” said the chipmaking giant.
NVIDIA estimates that the AI inference industry is poised to grow in the next five years into a $20 billion market.
Chris Kleban, product manager at Google Cloud, said, “AI is becoming increasingly pervasive, and inference is a critical capability that customers need to successfully deploy their AI models, so we’re excited to support NVIDIA’s Turing Tesla T4 GPUs on Google Cloud Platform soon.”
Take Our Survey