link to homepage

Institute for Advanced Simulation (IAS)

Navigation and service

Debugging Parallel Applications

If an application aborts unexpectedly, it is useful to monitor the execution of the application in more detail, i.e. to check which branches of the code are actually executed, what are the actual values of variables, which part of the memory is used etc.

The simplest way to do this debugging is to use print statements in the code in order to get the desired information. However, this is tedious (each time a print or write statement is added the source needs to be recompiled and rerun). Furthermore, since the code is modified the runtime conditions change and may influence the behavior of the applications. Therefore, this way of debugging is not recommended.

Instead, in the first place the compiler offers the possibility to check for certain errors during the compilation of the code. For this special compiler flags have to be used which will be described in more detail in the next section. It is recommended to go this way first when debugging is necessary, because the usage is quite easy and does not require any additional software.

But not all errors can be detected this way since some occur only at run time. In this case debuggers need to be employed. Debuggers are powerful tools to analyse the executions of applications on the fly, i.e. while they are running. In general, the corresponding applications need to be recompiled once using appropriate compiler flags and are then executed under the control of the debugger.

Compiler flags

Debugging options of the compilers

In the following useful debugging options for the XL compilers are listed and explained. Simply add them to the compile command you usually use for your application. The information are taken from the man pages of the XL compilers, for further information about compiler flags just type man bgxlf or man bgxlc.

-O0With this option all optimizations performed by the compiler are switched off. Sometimes errors can occur due to too aggressive compiler optimizations (rounding of floating point numbers, rearrangement of loops and/or operations etc.). If you encounter problems that might be connected to such issues (for example, wrong or inaccurate numeric results) try this option and check whether the problem persists. If not, increase moderately the optimization level.

For Fortran this option is identical to the -C option (see list of flags for Fortran codes below). For C/C++ codes this option enables different runtime checks, depending on the suboptions_list (colon-separated list, see below) specified, and raises a runtime exception (SIGTRAP signal) if a violation is encountered.

allEnables all suboptions.
boundsPerforms runtime checking of addresses when subscripting within an object of known size.
divzeroPerforms runtime checking of integer division. A trap will occur if an attempt is made to divide by zero.
nullptrPerforms runtime checking of addresses contained in pointer variables used to reference storage.

Generates instructions to detect and trap runtime floating-point exceptions.

<suboptions_list> is a colon-separated list of one or more of the following suboptions:

enableEnables trapping of specified exception.
impreciseOnly checks for the specified exceptions on subprogram entry and exit.
inexactDetects floating-point inexact exceptions.
invalidDetects floating-point invalid operation exceptions.
nanqGenerates code to detect and trap NaNQ (Quiet Not-a-Number) exceptions handled or generated by floating-point operations.
overflowDetects floating-point overflow.
underflowDetects floating-point underflow.
zerodivideDetects floating-point division by zero.

Stops the compiler after the first phase if the severity level of errors detected equals or exceeds the specified level <sev>. The severity levels in increasing order of severity are:

iinformational messages
llanguage-level messages (Fortran only)
wwarning messages
eerror messages
ssevere error messages
uunrecoverable error messages (Fortran only)
-qinitauto=[<hex_value>]Initializes each byte or word of storage for automatic variables to the specified hexadecimal value <hex_value>. This generates extra code and should only be used for error determination. If you specify -qinitauto without a <hex_value>, the compiler initializes the value of each byte of automatic storage to zero.

The following flags can be used only with Fortran codes:

-CChecks each reference to an array element, array section, or character substring for correctness. This way some array-bound violations can be detected.


Makes the initial association status of pointers disassociated instead of undefined. This option applies to Fortran 90 and above. The default association status of pointers is undefined.
-qsigtrap[=<tap_handler>]Sets up the specified trap handler to catch SIGTRAP exceptions when compiling a file that contains a main program. This option enables you to install a handler for SIGTRAP signals without calling the SIGNAL subprogram in the program.

The following flags apply only to C/C++ codes:


Warns of possible problems with string input and output format specifications. Functions diagnosed are printf, scanf, strftime, strfmon family functions and functions marked with format attributes.
<options_list> is a comma-separated list of one or more of the following suboptions:

allTurns on all format diagnostic messages.
exargWarns if excess arguments appear in printf and scanf style function calls.
nltWarns if a format string is not a string literal, unless the format function takes its format arguments as a va_list.
secWarns of possible security problems in use of format functions.
y2kWarns of strftime formats that produce a 2-digit year.
zlnWarns of zero-length formats.

Produces or suppresses additional informational messages. <groups_list> is a colon separated list. If a <groups_list> is specified along with a <suboption>, a colon must separate them.

The suboptions are:

allEnables all diagnostic messages for all groups.
privateLists shared variables that are made private to a parallel loop.
reductionLists variables that are recognized as reduction variables inside a parallel loop.

The list of groups that can be specified is extensive. Here only a few are given.

For a complete list please refer to the manual page of the bgxlc compiler.

c99C code that might behave differently between C89 and C99 language levels
clsC++ classes
cmpPossible redundancies in unsigned comparisons
cndPossible redundancies or problems in conditional expressions
genGeneral diagnostic messages
ordUnspecified order of evaluation
pptTrace of preprocessor actions
uniUninitialized variables

Compiler flags for using debuggers

In order to run your code under the control of a debugger, you need to recompile your application including the following compiler flags (XL compilers):

-g -qfullpath

Additionally, the flag


may be useful. When specified, it ensures that function parameters are stored on the stack even if the application is optimized. As a result, parameters remain in the expected memory location, providing access to the values of these incoming parameters to debuggers.

Available debuggers

Once you have compiled your application with the correct compiler flags you can run your application under the control of a debugger and monitor the behavior on the fly in detail.

Debugging tool STAT

STAT (Stack Trace Analysis Tool) is a tool developed by Lawrence Livermore National Laboratory to quickly show groups of processes in a parallel application that exhibit similar behavior. This tool scales to millions of processes.

This is very useful, for example, to quickly identify which and where some processes hang or exhibit deadlock behavior. After assessing the offending ones, one can easily debug only these ones with a full-featured debugger such as Totalview.

STAT works by gathering, merging and showing the stack traces of all processes in a color-coded tree. The nodes of this tree are function names for the given number of processes. The connections between the nodes are the groups of processes that followed that call-path.

STAT example

STAT's full documentation can be found at

Quick how to for running STAT on JUQUEEN

  • Your program must be compiled with debugging symbols (-g)
  • Load the STAT module with the command module load UNITE stat
  • Run the stat-gui command. It will open a window asking to attach to a process. As your program is not running yet, nothing will appear
  • Submit your application normally with llsubmit. It's a good idea to have it set to send you an email when execution begins (by setting the # @ notification = always line at your submission script). Hint: you can quickly check the stat of your process by using thecommand llq -u [USERNAME]
  • When your application is running, you go to the GUI and press the Refresh Process list button. It should show the pid of runjob with your submission. It's important that the STAT GUI is ran from the same login node as the application was submitted from (that is, either juqueen1 or juqueen2. This can be verified with the aforementioned llq -u [USERNAME] command
  • Attach to the runjob process - it will connect to all processes briefly, and show the current stack trace for that execution at that moment.

STAT showing rank 0 blocking a barrier on sleep()STAT showing rank 0 blocking a barrier on sleep()

At the example, one of the processes is holding the execution of the whole program - in this case, a simple sleep() call at rank 0, while the other ranks are waiting in a barrier for it.

When STAT captures a snapshot of a program's execution, it also pauses it. By clicking the resume button execution goes ahead. One can sample the application again, to get another snapshot of the execution.


Totalview is a very powerful debugger supporting C, C++, Fortran 77, Fortran 90, PGI HPF and assembler programs and offers among others the following features:

  • Multi-process and multi-threaded
  • C++ support (templates, inheritance, inline functions)
  • F90 support (user types, pointers, modules)
  • 1D + 2D Array Data visualization
  • Support for parallel debugging (MPI: automatic attach, message queues, OpenMP, pthreads)
  • Scripting and batch debugging
  • Memory Debugging
  • Reverse Debugging with ReplayEngine

Using Totalview interactively

Important: In order to be able to use the graphical user interface please make sure you are looged in with ssh -X If you are not directly connected to JUQUEEN, make sure you are using for all ssh connections the -X option and that your local system (laptop, PC) has a running X server!

In order to debug you program with Totalview load the UNITE and Totalview modules first:
module load UNITE totalview
The most common way to use Totalview (like any other debugger) is an interactive usage with a graphical user interface. In order to do so start the totalview launch script lltv.

For example:

lltv -n <nodes> : -default_parallel_attach_subset=<rank-range> runjob -a --exe <program> -p <num>

This will start the program <program> with <nodes> and <num> processes per node, attaching totalview to ranks <rank-range>. The subset specification <rank-range> can be in one of these forms

  • rank: that rank only
  • rank1-ranks2: all ranks between rank1 and rank2 inclusive.
  • rank1-rank2:stride: every strideth rank between rank1 and rank2

A rank specification can be either a number or "max" (the last rank in the MPI job). Totalview will launch three windows, the root window, the startup-parameter and the process window.

Totalview root window

Totalview startup parameters window

In the startup-parameter window, you have the four tags Debugging Options, Arguments, Standard I/O and Parallel. If you wish to acitvate the memory debugging check the corresponding box in the tag Debugging Options. If you would like to change or add the arguments, which are passed to your application or to runjob, you can do so under Arguments. Please do not change anything in Parallel. Once you have made all changes needed, click on OK.

In order to support Memory Debugging on Totalview using the MEMORYSCAPE tool, one needs to link their codes with Memoryscape’s Heap Agent. To do so for statically-compiled codes, the following parameters must be added to the compiler:

-L/usr/local/UNITE/packages/totalview/8.15.4-6/linux-power/lib/ -Wl,@/usr/local/UNITE/packages/totalview/8.15.4-6/linux-power/lib/tvheap_bgqs.ld

Where 8.15.4-6 is Totalview’s current version as of February 23rd, 2016.

Totalview  process window

Click on GO in the process window of Totalview. Totalview will proceed executing the runjob command and launch your application. This may take several minutes depending on the size of the partition you have requested (i.e. the number of task you would like to run).
A dialog window appears after clicking on GO.

Totalview dialog

Click on YES and after a few seconds the source code of the main program of your application appears in the process window and you can start debugging your code.

For a detailed description of the usage of Totalview, please refer to the Totalview Documentation (Rogue Wave Software) for a user's guide and further information about Totalview.