Nvidia's new Tesla P4 and P40 GPU accelerators are aimed at production AI systems.
The Nvidia Tesla P4 and P40 are designed to run already-trained neural networks for technologies such as speech, image and text recognition.
The reason for two models is that it gives organisations the option to optimise either performance or power consumption.
According to Nvidia fellow David Kirk, the P4 provides 40 times the energy efficiency of a CPU-based server of equivalent performance (it draws just 50W), while the P40 delivers 40 times the performance of a CPU-based server drawing the same 250W as the P40.
{loadposition stephen08}With 47 tera-operations per second (TOPS) of inference performance with INT8 instructions, a server with eight Tesla P40 accelerators can replace the performance of more than 140 CPU servers, the company claimed, saving more than US$650,000 in acquisition costs alone.
"Deep learning turns data into information and value," said Kirk.
Asked about the future of CMOS technology for CPUs and GPUs, he said the horizon has always been about seven years away. Changes to chemical formulations may allow further reductions in the size of individual transistors and 3D fabrication allows greater density, but even when things can't get any smaller it is likely that vendors will be able to make chips in greater volumes and at lower cost.
"CMOS is a great workhorse technology and we can engineer it to be better," he said.
While the physics affecting the operation of today's smallest fabrication processes means we no longer see a reduction in power consumption as transistor sizes are further reduced, exploiting parallelism means two processors can be twice as one.
He also suggested Intel's practice of including out-of-order processing on chips at this scale is "a luxury" because there is always something that can be executed in order in an application with 50,000 threads. The silicon real estate and power budget can be put to better use by abandoning out-of-order and designing in more execution units, he said.