The International Symposium on Computer Architecture is right around the corner. We bring you a short overview of the topics in store for this year’s edition.
First, Intel’s Mark Bohr will deliver a keynote on CMOS scaling. Bohr is a Senior Fellow at Intel and regularly speaks about future challenges to various crowds. The ISCA audience is definitely demanding, so it will be interesting to find out what message Intel has to bring in this venue. One thing that keeps popping up is an optimistic 14nm+/10nm projection, which assumes an above-average performance per transistor. Naturally, there is also a lot of talk about heterogeneous integration, which progresses these days on many fronts: from chip level to data center level.
The second keynote comes from none other than Google, who is in a very good position to talk about how Moore’s Law and its progression impacts real life workload. The speaker is Partha Ranganathan, formerly the Chief Technologist of HP Labs.
ISCA is definitely moving ahead with the times, dedicating two sessions to machine learning. On this front, EE Times has an interesting report on a few new items that will be coming up during the conference.
One particularly interesting development is Plasticine, a pre-design reconfigurable CPU from Stanford, that could have 100x the performance-per-Watt of FPGAs. Currently the chip is said to reach 12.3 TFLOP (SP) at a 1 GHz clock and 49W TDP. The concept behind the CPU is that higher level abstractions can be extracted from applications, leading to a more informed understanding of data and control flow, including data locality and memory access. Hence the name of Plasticine components – “Pattern Compute Units” and “Pattern Memory Units”. The design leverages a range of optimizations, including hierarchical distributed memory management and streaming optimizations. The designers say that rather than focusing on neural convolution, they chose to support often changing dense and sparse algorithms more efficiently.
More on the convolutional side, NVIDIA will present their SCNN inference accelerator. It is said to deliver 2.3x the energy efficiency of a comparable accelerator, through a more aggressive approach towards optimizing math operations. A range of other machine learning optimizations are included, focusing on weight and activation parameter delivery, reducing the overall pressure on DRAM. According to the authors, although the chip hasn’t been produced, the main commercial advantage would be the fact that SCNN exploits sparsity.
These two approaches seem to go in slightly different directions, but both use cutting-edge reconfigurable computing as a solid base for research. It is quite likely that in the not so distant future we will see pervasive reconfigurable logic in general purpose chips, and then it will be up to the software to expose and exploit the wonders within.