|
Currently, we want to scale up to 100k or even 1 million cores using accelerators, GPUs and the like. The conclusion is that there is a software challenge "scalability" and the hardware challenges are calling for new solutions. We have to look at the scalability issues in all respects and not just the cores. The issue of the many-many-many cores? There will be auto-vectorisation according to the speaker. This calls for data-parallel-coding for automatic parallelisation which is cache-friendly. Another assumption made by Dr. Fischer is one instruction for many data using a vector register that keeps many data.
Apart from that, hybrid parallelisation can help. With hybrid, we mean MPI + OpenMP.
Linpack can be offloaded quite well if we are considering accelerators. Dr. Fischer wants to work on enhanced vector architecture and data locality, high memory bandwidth and low latency. The future vector product vision promises 7-10 x Tflops per floor space and Tflops per kWatt, as well as Tflops per euro, dollar or yen in comparison with the SX-9. |