Application Extreme-scaling Experience on Stampede
Carlos Rosales-Fernandez, Texas Advanced Computing Center (TACC)
The Intel Xeon Phi co-processor, also known as MIC, is becoming more popular in HPC. Current HPC clusters like Tianhe-2, Stampede and Cascade are using this technology, and upcoming clusters like Cori and the Stampede upgrade will be comprised of the next generation of MIC co-processor, known as Knights Landing (KNL). While running codes in symmetric mode across both host CPU and Phi co-processor is very attractive in terms of resource utilization, there are several issues that can limit the effectiveness of this approach at scale. We describe these limitations as well as solutions provided by software developers in the form of “proxy” enhancements to the MPI runtime. We present microbenchmark results to highlight MPI performance characteristics for communication paths including MIC co-processors, and then analyze their impact on a Lattice Boltzmann based code run at scale in Stampede.