A Strategy for Developing a Performance Portable Highly Scalable Application
Stefan Andersson, Cray Inc.
Over the last years Cray has gained huge experience scaling real-world applications to hundreds of thousands of cores, like in the Titan project at ORNL and the XXL project at HLRS, where not only academics but also industry partners participated. Based on this experience, we look into what is needed to scale to next generations of HPC computers with millions of cores where increased performance is not only coming from more nodes but, even more important, from more powerful nodes. Now new strategies must be developed to improve MPI performance when the performance of a single node has increased by a factor of ten and the injection rate has pretty much stayed constant. Can MPI parallelism be shifted to threading on the node? Vectorization is becoming more important, does the application vectorize as well as it should? IO is also playing an increasing role. The amount of data produced by a single high scaling job today produces data in the order of 100s of TBytes and will very likely run into PBytes of data in the near future. This not only creates a need for very high performance IO APIs, but also requires thinking about the whole workflow of IO data. These are all questions that need to be addressed for the next generation of systems. In this talk we will show Cray's vision on what a system has to offer to application developers to address these problems.