LinkTest is a communication API benchmarking tool that tests point-to-point connections and is designed to scale up to a large number of processes (tested using 1 800 000 MPI tasks on JUQUEEN, Blue Gene/Q). Processes can be on the same node or on different nodes, provided that a physical link of the tested link layer exists. It supports the benchmarking of the following APIs: MPI, TCP, UCP, IB Verbs, PSM2, and NVLink bridges through CUDA and CUDA-aware MPI. Output of the program is a full timing communication matrix of the message-transmission times for all pairs of processes written parallelly to a SION file. A standard out log that summarizes the results is also provided. Additional python tools read the generated SION file and to generate pdf reports.
LinkTest performs serial and parallel test. In serial mode all (N-1)*(N/2) pairs of processes are tested sequentially. In parallel (N/2) pairs can be tested at once, resulting in N-1 steps. However, which pairs are tested together in one step influences the parallel performance. Therefore, LinkTest offers the possibility to run multiple times with randomized partitions of pairs into steps.
Pairs are tested by one of three kernels. By default, one process sends a message to its partner, which sends the same message right back (Semi-directional). Alternatively, both processes can transmit the message at the same time (Bidirectional). Thirdly, one process can send a number of unacknowledged messages and receive only one confirmation message at the very end (Unidirectional). The specified kernel repeats a user-specified number of times and the average time is measured. To warm-up, a connection before testing the same kernel can run without timing for a user-specified number of times.
Below is an example of an inter-node MPI benchmark from our AMD EPYC 7742 Case Study.