Performance Analysis and Optimizations of Argobots
Speaker: Sangmin Seo (ANL)
Date: Friday, 4 December 2015, 13:30-15:00
Session: Programming Models III
Talk type: Project talk (30 min)
Abstract: Argobots is a lightweight low-level threading and tasking model, which aims at providing high-level programming model runtimes with efficient threading and tasking mechanisms rather than enforcing predefined policies. It supports two types of work units, user-level threads (ULTs) and tasklets, and exposes explicit mapping between OS threads and work units. While Argobots is designed to provide lightweight work units, they may incur non-negligible overhead regarding basic operations, such as creation, destruction, join, and scheduling, depending on how they are implemented. In this work, we perform an in-depth performance characterization of the baseline Argobots implementation, and present performance optimizations, such as cache-friendly data structure organization, feature selection, scheduler bypass for multiple joins, memory pool-based allocation, and using huge pages, to reduce overhead of basic operations. Experimental results using microbenchmarks and applications show that our optimization techniques reduce cache misses and TLB misses in many cases and significantly improve the performance of all basic operations.