Category Archives: News

General news

ISC 2017 logo

TIK exhibiting at ISC’17 – booth L-444

TIK will be exhibiting at this year’s ISC’17 conference in Frankfurt! Between June 19th and June 21st, you can find us in booth L-444 in Hall 3, where the main exhibition is.

Our live training offer ranges from HPC hardware and software topics, through extreme performance optimization, up to applications of Python for science and engineering. If you would like to know more about our training, or just have a general chat about computing technology, feel free to pass by. We’ll see you there next week!

Where to find us at ISC'17

Where to find us at ISC’17

Moore's Law according to Mark Bohr

Moore’s Law, new architectures, machine learning

The International Symposium on Computer Architecture is right around the corner. We bring you a short overview of the topics in store for this year’s edition.

First, Intel’s Mark Bohr will deliver a keynote on CMOS scaling. Bohr is a Senior Fellow at Intel and regularly speaks about future challenges to various crowds. The ISCA audience is definitely demanding, so it will be interesting to find out what message Intel has to bring in this venue. One thing that keeps popping up is an optimistic 14nm+/10nm projection, which assumes an above-average performance per transistor. Naturally, there is also a lot of talk about heterogeneous integration, which progresses these days on many fronts: from chip level to data center level.

Intel's 14nm and 10nm projections (Image: Intel)

Intel’s 14nm and 10nm projections (Image: Intel)

The second keynote comes from none other than Google, who is in a very good position to talk about how Moore’s Law and its progression impacts real life workload. The speaker is Partha Ranganathan, formerly the Chief Technologist of HP Labs.

ISCA is definitely moving ahead with the times, dedicating two sessions to machine learning. On this front, EE Times has an interesting report on a few new items that will be coming up during the conference.

One particularly interesting development is Plasticine, a pre-design reconfigurable CPU from Stanford, that could have 100x the performance-per-Watt of FPGAs. Currently the chip is said to reach 12.3 TFLOP (SP) at a 1 GHz clock and 49W TDP. The concept behind the CPU is that higher level abstractions can be extracted from applications, leading to a more informed understanding of data and control flow, including data locality and memory access. Hence the name of Plasticine components – “Pattern Compute Units” and “Pattern Memory Units”. The design leverages a range of optimizations, including hierarchical distributed memory management and streaming optimizations. The designers say that rather than focusing on neural convolution, they chose to support often changing dense and sparse algorithms more efficiently.

Plasticine vs. a 28nm Stratix FPGA (Image: Stanford)

Plasticine vs. a 28nm Stratix FPGA (Image: Stanford)

More on the convolutional side, NVIDIA will present their SCNN inference accelerator. It is said to deliver 2.3x the energy efficiency of a comparable accelerator, through a more aggressive approach towards optimizing math operations. A range of other machine learning optimizations are included, focusing on weight and activation parameter delivery, reducing the overall pressure on DRAM. According to the authors, although the chip hasn’t been produced, the main commercial advantage would be the fact that SCNN exploits sparsity.

These two approaches seem to go in slightly different directions, but both use cutting-edge reconfigurable computing as a solid base for research. It is quite likely that in the not so distant future we will see pervasive reconfigurable logic in general purpose chips, and then it will be up to the software to expose and exploit the wonders within.

What’s next for server chips?

For anyone following the server processor market in the last years there should be one thing clear as day: “it’s about to go down”. Competition between arch-rivals AMD and Intel is back on, while designer ARM is closing deals and opening up new spaces. Let’s take a closer look at what is in store.

Will Intel keep their crown?

Possibly the biggest news of the last months is the real-world performance of AMD’s new Ryzen CPUs. Recently launched in the client space, Ryzen chips offer approximately double the number of cores of their Intel counterparts, within roughly the same TDP and also on a 14nm process. AMD’s chips also feature pretty big caches for what we’re used to seeing in this segment. Such features are probably of less interest to the current target of AMD’s marketing efforts – the hardcore gamers – but could make many data center managers very happy. Higher core count means higher integration, more performance per system, lower TCO. Ryzen chips also feature the goodies that Intel got us used to – AVX, Turbo mode and hardware threading (also called SMT or Hyper-Threading on Intel x86). How do we know Intel is feeling threatened? Money. They’ve significantly dropped the prices of their desktop processors, in some cases by as much as 25%. If (or “once”) Ryzen threatens Intel’s Xeon cash-cow, we can be sure that Intel will defend it very vigorously and will aggressively work to deliver performing 10nm parts in 2018.

AMD Naples overview (Image: Anandtech)

AMD Naples overview (Image: AMD/Anandtech)

AMD indeed has a 32-core part in the pipeline, codenamed “Naples”, with strong support for DDR4. New dual-socket systems based on this chip are said to support up to 2 TB of memory, 512 GB in practice vs. the 384 GB most Intel platforms offer today. We should keep in mind that AMD has a history of undercutting Intel’s high performance enterprise offerings – for instance, by offering 4-socket platforms at attractive price points, making Intel customers ditch two 2-socket servers in favor of one.

ARM creeping in

Another interesting development is the interest from Microsoft to use ARM chips for production in its cloud business. At the Open Compute Project Summit, Microsoft declared that ARM would be the base of a future server design plugging into the Project Olympus form factor. Such work triggers the signing of new partnerships around the idea, as well as the development of a bunch of related components.

Microsoft has already had some stake in ARM development – for instance, with past versions of Windows. Now, it looks like the majority of Windows server-focused functionality will have to run smoothly on ARM, which is a big deal – especially for a company making one of the most used operating systems in the world.

The future of platforms, according to Microsoft

The future of platforms, according to Microsoft

We all know that Intel’s round of layoffs and accelerated retirements leaked a solid number of talented engineers and executives into the marketplace. Interestingly, it seems that Intel might be facing some competition from the very people who used to fuel the company. The Bloomberg article on the matter quotes both Anand Chandrasekher, now in charge of Qualcomm’s server chips, as well as Kushagra Vaid, now responsible for Azure hardware infrastructure at Microsoft.

Intel, Nervana and AI

EETimes is reporting a on Intel’s Lake Crest accelerator acquired from Nervana Systems. Its fundamental mission is to make AI primitives process faster – much faster than GPUs, in fact.

In Intel’s broader push for the support of AI and machine/deep learning in 2016, the chip could be considered as direct competition in an area so far dominated by GPUs. Lake Crest is designed from scratch, and currently scheduled to be produced on a 28nm TSMC process. It makes an interesting tradeoff, which is quite similar to the twist GPUs put on CPUs in their time – mathematical operations are simplified, leading to performance boosts of up to 10x. To make things better, nodes can be interconnected with high bandwidth links to enable the construction of custom topologies for an optimized hardware-to-model fit. Finally, the chip is meant to use “High Bandwidth Memory 2”, which promises up to 8 Tbit/s transfers.

Nervana, like many other AI vendors, supports Tensor Flow (championed by Google). With the recent collaboration agreement between Google and Intel, things can only get more interesting.

Image: EETimes

Intel’s Python distribution is now officially published

Intel is talking about their own Python distribution getting released as a product. Early access has been available for a while and the suite, compatible with the popular Anaconda environment, is now being released in full.

The prevailing challenge in Python programming is to use all the benefits of a modern CPU such as vectors, cores, etc. This is also a reason why TIK teaches these topics in dedicated courses: notably High Performance PythonPython and Parallelism, Python for Science and Engineering or Python for Finance. Now, Intel is hoping to put more compute power in the hands of high-level language programmers.

Concretely, the Intel distribution contains optimized versions of numerical packages. Under the hood, Intel used technologies such as its Math Kernel Library, Threading Building Blocks or the Intel Compiler to maximize computational performance. Intel says that with these improvements, their numerical Python code runs close to native performance. While it is an important stepping stone, we are probably all hoping for a situation where many applications other than linear algebra can run fast coded in high level languages such as Python. Intel recognizes this need in packages such as numpy, scipy and scikit-learn.

Intel Python performance vs. C MKL (Image: Intel)

Intel’s Python packages can be obtained directly from the Anaconda Cloud.

See also: the press release by Continuum Analytics, Anaconda’s creators.


Hot Chips ’16: AMD publishes Zen x86 core details

Hot Chips 2016 just finished and one of the interesting parts was the AMD publication focused on their new x86 “Zen” core.

AMD claims Zen will have 40% IPC improvements over Excavator – an impressive achievement, but not guaranteed to beat Intel’s Atom. Zen is also supposed to follow a similar philosophy that Atom does today: the power is kept rather constant across generations while performance increases. That is something AMD refers to as “constant energy per cycle”.

In terms of concrete architectural changes, improvements were made across the board (see figure), with slightly more focus on FPU latency and bandwidth. It would seem reasonable to feed vectorization units, however we still need to seem some performance numbers. The Zen is compliant with major floating point extensions used today, such as AVX1 and AVX2.

Finally, “Zen” features round-robin SMT, and AMD is clearer than Intel on which parts of the CPU are being shared and in what way – see figure below.

AMD SMT usage in Zen

AMD SMT usage in Zen


Intel launches SiPh transceivers capable of 100Gbps over 2km

At this year’s IDF in San Francisco Intel announced volume shipments of new discrete Silicon Photonics modules, apparently capable of maintaining a 100Gbps data rate over a 2km long link. This is a step up from previous capabilities available in “shops” and a personal promise from Diane Bryant, Intel’s Data Center Group chief.

Silicon Photonics is a promising technology which aims to develop standard optical components such as light sources, detectors, mirrors, waveguides, etc using standard silicon technology. A major potential benefit is cost saving, since the elements would be produced in a standard process. Secondly, Silicon Photonics can significantly increase integration, improving communication capabilities of other on-chip components. Think direct, on-chip optical integration.

The SiPh building blocks

Initial deployments are targeted at datacenters. One example application is Intel’s Rack Scale technology (RSA). It provides a vision for a highly integrated datacenter rack, where storage, compute and memory are all pooled together.

Rack Scale concept

Other manufacturers are also actively working on the technology.


Intel’s IDF press note

HP acquires SGI

VB is reporting that HP Enterprise acquired SGI for $275 million. This is an important milestone for the embattled manufacturer, which used to sell high-quality workstations like hot cakes.

In its tumultuous history, SGI has gone through a bankruptcy and two rebrandings. SGI’s legacy includes the Indy, Onyx and O2 workstations, as well as a host of other famous devices, prized for their compute and graphics proficiency. NB: This author’s first UNIX, for instance, was Irix.

VB also reports: “SGI stock was up more than 27 percent in after-hours trading, while HPE stock was down slightly.” — are we surprised?



ARM sold to Japanese Softbank

Numerous media outlets are reporting the sale of chip designer ARM for £24bn to Japan’s Softbank. ARM being a very successful company in both economic and technical terms, this is huge news for the industry.

ARM stock quote

The company’s NASDAQ price went up by 40% in a step function

Sales and profits both look healthy for the Cambridge chip maker and a deal with Softbank, the company also invovled with Alibaba, will allow ARM to take an even stronger position in the IoT world. Some analysts claim ARM will now have to allay fears of market saturation, while others ask whether it was not sold too cheaply. Interestingly enough, the CEO Simon Gears claims that as much as 60% of ARM chips go into non-smartphone markets.

ARM itself does not make chips and makes money off of royalties paid by other tech giants, including Apple and Samsung.


MIT produces Swarm chip for “easy parallel programming”

MIT has a long standing history of producing computing innovation. Their latest research is focused on what they call a “Swarm” chip, which is aimed at simplifying parallel coding by removing explicit synchronization and minimizing the interference of processors. This is in contrast to classic Intel-like chips with coherent caches — which, by the way, will have to change, but that is a subject of another discussion.

Applications tested on the processor ran 3-18x faster, while needing in the order of 1/10th of the code usually required. MIT says that the processor time-stamps tasks internally and then automatically determines which should be worked on first by the whole chip.

Task-based processing systems are gaining more and more popularity as programmers are reaching the limits of data parallelism for many common problems. This is well seen in the rise of popularity of programming environments such as Intel’s Threading Building Blocks, Cilk+ and task additions to the OpenMP standard. Swarm does some of this in hardware, promising new avenues for acceleration with minimal overhead in code.

Involved in the design was Joel Emer, who was also one of the key people behind the Triggered Instructions concept, in the domain of spatial programming.