Batch Job Processing on JUGENE
IBM Tivoli Workload Scheduler LoadLeveler is used as batch system on Blue Gene/P.
Job submission to LoadLeveler is done using a job command file. The job command file is a shell script containing keywords embedded in comments beginning with #@ These keywords inform LoadLeveler of the resources required for the job to run, the program to execute, where to write output files and the job environment.
Sample job scripts can also be found on the system in the directory:
Jobs are submitted with
llsubmit <jobfile name>
Blue Gene specific keywords:
You have to indicate your job as an Blue Gene job with #@ job_type = bluegene. Otherwise the job is executed as a serial job on the login node without allocating a Blue Gene partition.
The size of a job has to be specified by using #@ bg_size or #@ bg_shape.
- The bg_size keyword specifies the number of compute nodes the job should use. Blue Gene/P only allows partitions including 32, 64, 128, 256 and multiples of 512 compute nodes. Thus e.g. bg_size of 1 specifies a partition of size 32 and bg_size of 129 requests a partition of size 256.
The bg_shape keyword specifies the shape of the partition at the base partition (midplane) level, not at the compute node level. A bg_shape value 1x2x1 means 1 base partitions in the x direction, 2 in the y direction and 1 in the z direction, which are two midplanes = 1024 compute nodes. bg_shape defines the logical dimensions of your partition. For an efficient scheduling LoadLeveler may allocate physically one of three permutations (1x2x1, 2x1x1, 1x1x2) and ensures the correct mapping of the MPI-tasks.
If - and only if - you are using your own mapfile (-mapfile option in the mpirun command) or your application relies on a correct physical size of the partition you have to use the #@ bg_rotate = FALSE keyword together with bg_shape. This indicates LoadLeveler that only the requested shape satisfies the job requirement.
- The topology of the partition can be specified with the bg_connection keyword, which can be one of the three values: MESH (default), TORUS and PREFER_TORUS. This choice can have a big influence on the performance of your application. In case of doubt always add #@ bg_connection = TORUS to your job script.
For a detailed description of the Blue Gene specific keywords and a table of the core general keywords, see LoadLeveler Keywords.
On a Blue Gene/P system, the program to execute an application is always the mpirun command. In other words, preparing a job for submission requires you to create a job command file that passes the appropriate arguments to the mpirun command by adding the mpirun call after the #@ queue keyword in the job file.
Since LoadLeveler automatically selects the appropriate partition to run the job on, the –partition option should not be specified in the mpirun command.
The number of MPI-tasks can still be controlled with the -np option, the execution mode (Virtual Node mode / DUAL mode / SMP mode) is specified with -mode VN or -mode DUAL or -mode SMP inside the argument list.