We learned some details about Nvidia's next-gen Pascal GPU architecture at last year's GTC. After today's GTC keynote, we now know a little more about how Pascal will operate. We also have new info about its memory capacity and bandwidth.
Perhaps the biggest feature of Pascal revealed today is its ability to operate in a mixed-precision mode, similar to mobile GPUs like ARM's Mali. Nvidia's current-gen GPU architecture, Maxwell, is limited to fp32 operation, meaning that int8, fp16, and fp32 operations are all processed internally at the same rate. The Maxwell GPU in the Tegra X1 SoC adds the abillity to operate in fp16 mode, which can effectively double its throughput.
There might be a little of what Huang calls "CEO math" involved here, but Pascal is said to provide up to four times the throughput in mixed-precision workloads. That big number might be explained by the addition of an int8 mode along with fp16 and fp32. Pascal also promises better performance simply due to the inherent architectural improvements in its design.
We also learned more about the benefits of Pascal's 3D memory, which Huang first described at last year's GTC. Huang said that in the past, tradeoffs existed between bandwidth and the size of the the RAM pool. One could have bandwidth or size, but not both (at least, not easily). Pascal's 3D memory technology changes the math involved in that tradeoff. The next-gen chip is said to be able to use up to 32GB of RAM, while providing up to three times the memory bandwidth of Maxwell.
Pascal's improved memory bandwidth and its ability to operate in mixed-precision mode are both important for deep-learning applications. These programs don't need high levels of floating-point precision, but they are limited by memory bandwidth. With these improvements, along with the claimed scaling benefits offered by the Nvlink GPU interconnect (and a bit of CEO math), Huang claimed that Pascal might offer anywhere from five to ten times the performance of Maxwell GPUs in deep-learning tasks.