Typical duration: 2 days
Course code: PR04
This course is an advanced class covering vectorization challenges and solutions.
We cover the main trends for vectorizing code, starting from assembly level to automated solutions such as autovectorization. We discuss the practical impact of data parallel design on code structure, runtime and memory management. Other topics covered include source management, platform independence, single source vectorized code. Finally, we note platform oriented aspects of data parallelism such as bandwidth measurements, memory configuration and topology, allocations, and future directions in memory management.
The course is composed of lectures (50%) and hands-on labs (50%).
After the course, the students will:
- Expertly vectorize existing and new code to obtain speedups
- Master the hardware and software infrastructures for efficient vector computing
- Master technologies for vector computing and deeply understand their benefits
- Understand how data parallelism implicates design
- Manage a single vectorized source using a variety of techniques
Runtime and memory systems for vectorization,
vectorization impact on program layout, programming techniques and technologies; hiding away unpleasant code
Vector instruction throughput and efficiency on CPUs and accelerators
Cache topology, non-uniform memory, memory programming and allocation, future directions in memory management