Blue Gene/P V1R2M0 Memo to Users


Document Description
The following is a list of the major new features found in Version 1.0 Release 2.0 (V1R2) of IBM® Blue Gene®/P™.

Kernel Checkpoint/Restart Compression

Compression support was added in V1R2 for checkpoint/restart. The compression is done in memory. This has a twofold benefit of decreasing the amount of IO being done when checkpointing as well as reducing the overall size of the checkpoint files. There are two environment variables to specify. BG_CHKPTCOMPRESS=YES is specified on the checkpoint and BG_CHKPTDECOMPRESS=YES is specified on the restart.

Debug Memory Watchpoint Support

Limited watchpoint was added to the gdbserver debug implementation. This limited watchpoint support allows the user to specify the watch and rwatch gdb commands from the gdb client command line. The watch MY_VARIABLE command will cause the program to stop running with a SIGINT reported when a write is done into the location my_variable. The rwatch MY_VARIABLE command will cause an interrupt when a read is done. There are some limitations with the implementation. Once you hit a watch, a continue/step/next will not allow the code to progress past the watchpoint. To continue, you must first delete or disable the watch, then continue. If you want the watch to be re-enabled before continuing, you can disable/delete the watch, perform a step, re-enable/re-create the watch, and then continue.

Example watchpoint set/hit:

(gdb) set remotetimeout 60
(gdb) target remote 172.16.103.14:10015
Remote debugging using 172.16.103.14:10015
0x01001124 in _start ()
(gdb) watch touchthis
Hardware watchpoint 1: touchthis
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x01001424 in main () at threadcreate.c:36
36 touchthis = 0;
(gdb)

Debugging of Dynamically Linked Applications

The base debugger support is available to debug dynamically linked applications. The base support provided by the dynamic linker is understood by the gdb client. There are a few differences in handling the dynamic linked applications. The dynamic libraries needed for the application are all loaded by the dynamic linker at application start for those it knows about. The linker utilizes mmap to load each of the libraries. Each dynamic library is loaded from the IO node. Typically, the shared libraries that are found on the IO node after reboot are either stored in the ramdisk from the Blue Gene P toolchain or have a symlink that is created at boot time on the IO node. These are the libraries that symbols should be loaded from rather than from the Front End Node or Service Node. A /lib/libc.so.6 shown with information shared on the gdb client is really referring to the libc.so.6 that was provided via the Blue Gene toolchain, and that should be resolved to the appropriate location to get the correct symbols. Determining loaded addresses for the libraries can be done via strace facilities or using the environment variable LD_DEBUG=file to have the dynamic linker display the values.

Totalview Debugger Enablement

Totalview Technologies had debugger support via the Totalview debugger. Refer to the Totalview Technologies site for more information regarding their parallel debugger.

Shared Memory Segment Increase

The shared memory size was increased to allow applications to utilize 8 distinct /dev/shm/<file> handles for shared memory allocation. The previous release of Blue Gene allowed only 2 distinct file names for shared memory handles.

Note: The V1R2M0 MPI communications stack uses three of these shared memory file handles.

Process Memory Windows

The Process Memory Windows facility was added to allow processes within a physical node to access the memory of other processes within that same physical node. The following new kernel SPI functions were added to support this facility:
Kernel_GetProcessWindowSlotRange()- Return the TLB slots that are available for use by the Process Memory Window facility.
Kernel_SetProcessWindow()- Create a Process Memory Window using the specified TLB slot.
Kernel_GetProcessWindow()- Return information regarding a Process Memory Window for the specified TLB slot.

A new environment variable was introduced to support this facility: BG_PROCESSWINDOWS. The value provided for this environment variable will set the number of slots available for creating process memory windows.

Persistent Memory

Persistent Memory is process memory that retains its contents from job to job. To allocate persistent memory, the environment variable BG_PERSISTMEMSIZE=X must be specified, where X is the number of 1024*1024 bytes to be allocated for use as persistent memory. For the persistent memory to be maintained across jobs, all job submissions must specify the same value for BG_PERSISTMEMSIZE. The contents of persistent memory can be re-initialized during job startup by either changing the value of BG_PERSISTMEMSIZE or by specifying the environment variable BG_PERSISTMEMRESET=1. The following new kernel SPI function was added to support persistent memory:

persist_open()

Added Stack Guard Protection When Debugging

During normal operation, a stack guard is active to protect against a program growing its stack too large or accessing memory outside its allocation. The stack guard facility uses some of the same hardware resources used by the debugger. Previous to this release, whenever the debugger was attached, the stack guarding facility was disabled. Because the cases of an actual hardware conflict between the debugger and stack guard facility are rare, changes were made to keep the stack guarding enabled while the debugger is attached, as long as no direct hardware conflict exists. Currently, the only conflict between the stack guard facility and the debugger is when a hardware watchpoint is set. Previously, the environment variable BG_STACKGUARDENABLE had two supported values: 0 for disabled and 1 for enabled. The new values and their definitions are:
BG_STACKGUARDENABLE= 0Stack guarding is disabled.
BG_STACKGUARDENABLE=1Stack guarding is enabled. If debugger attempts to use a conflicting hardware resource, stack guarding will be disabled and the debugger action will succeed. (DEFAULT)
BG_STACKGUARDENABLE=2Stack guarding is enabled. If debugger attempts to use a conflicting hardware resource, the debugger request will be denied.

CIOD <-> CNK Efficiency Improvements

The internal protocol used to exchange messages between CNK and CIOD has been updated to improve performance by eliminating the use of a software header on every packet.

File Descriptor Limit Increased

The maximum number of open descriptors was increased from 256 to 2112. If there are many compute node processes opening a large number of descriptors, the system limit on the I/O nodes might need to be increased. The default system limit on the I/O node is 393,216 and can be increased by setting fs.file-max with the sysctl command.

New CNS Services
oIntroduced non-blocking forms of various CNS services. These can be used to eliminate long and/or indeterminate latency service calls in CNS by kernels that are intolerant of such situations (for example, Linux®).
oIntroduced the mapDevice service to CNS, which can be used to relax assumptions about the virtual addresses of various Blue Gene devices. Kernels can now choose their own virtual addresses for devices, provided that they communicate this to CNS and manage the TLBs.
oIntroduced new CNS services for managing global barriers. The new services give individual nodes the means to opt in or out of certain global barrier channels. There is also a Linux-friendly non-blocking barrier.

Improved RAS Messages

Many of the DMA unit RAS messages have been simplified and clarified.

Ethernet Flow Control Enabled (Included in V1R1M2 efix014)

The Ethernet hardware for the IO node has been enabled to process incoming pause frames in accordance with IEEE 802.3-2000, Annex 31. A network interface that is being overrun by a Blue Gene/P IO node can send pause frames to force the IO node to suspend transmission of additional frames for the specified interval.

Upgrade of Linux on the ION to 2.6.16.46

The IO nodes now run version 2.6.16.46 of the Linux kernel. The IO node runtime environment contains various security patches and other fixes.

New Control for Core File Generation

A new environment variable was added that, when set, will write a core file if the contributing exit status of the terminating processes is non-zero. This allows for applications that might exit(1) on some nodes to identify the node that issued the non-zero exit status (and help point to the root cause for the non-zero exit).

To specify the environment variable, add the following to the environment variables passed to mpirun:

BG_COREDUMPONERROR=1

The effect of the environment variable only has the job scope. Subsequent jobs are not affected by the environment variable.

CNK Resilience to Application Errors

In V1R1, when the kernel took an interrupt, the kernel did not switch to a private kernel-only stack. Instead, it continued to use the storage of the application stack. If the application was following the PowerPC ABI conventions, this would not be a problem. But if the application were deviating from the ABI, or if there was stack corruption, stack alignment, or some other application error, this could lead to kernel fatal RAS events and block deallocation.

In V1R2, when the kernel takes an interrupt, it immediately switches to a private kernel-only stack. This should protect the kernel from any faulty modifications to the stack pointer (GPR1) made by the applications.

Application Memory Layout Improvements

Improvements have been made to the static memory layout mapper, which allows for more efficient utilization of available memory. The virtual addresses for devices have been moved near the kernel addresses, which will make it easier to avoid virtual address collisions.

Among the improvements were support for process windows and larger shared memory regions.

CNK and CIOD Reference Source Code Available under CPL License

The source code for CNK and CIOD have been released as reference implementations. The source code is currently located at the following Web site:

wiki.bg.anl-external.org/index.php/Main_Page


Related APARs:
Related Public Documents:

    IBM disclaims all warranties, whether express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. By furnishing this document, IBM grants no licenses to any related patents or copyrights. Copyright © 2005, 2006, 2007, 2008 IBM Corporation. Any trademarks and product or brand names referenced in this document are the property of their respective owners. Consult the Terms of use link for trademark information.

    About IBM Privacy Contact


last change 03.06.2008 | Jutta Docter | Print