Search

link to homepage

Institute for Advanced Simulation (IAS)

Navigation and service


JUQUEEN - Jülich Blue Gene/Q

Production on JUQUEEN started on May 16th!

Userids from authorized projects, which are already active on JUGENE have been activated for JUQUEEN New user may be joined if they have applied.

Please be aware that the system is at an early state of its installation and may undergo changes or show problems. Front-End node juqueen2 is not yet available, use: juqueen2

For known problems and information about the current situation, please refer to the highmessage when you login, which will be kept up-to-date.

Documentation

Juqueen - 8 Racks

Production start of JUQUEEN on May 14th is delayed!

Due to problems with the GPFS software on the BG/Q IO-Nodes, which might generate the wrong group for the files that are written (under some circumstances), the start of user production on JUQUEEN will again have to be delayed for a couple of days. We are working with IBM to fix this software issue as soon as possible.

Production start of JUQUEEN on May 7th is delayed!

Due to problems with the technical infrastructure (a water valve for the cooling equipment) the JUQUEEN racks had to be shut down and the start of user production on JUQUEEN will be delayed for a couple of days. After the replacement of the defective parts the JSC internal tests with LoadLeveler (available since May 3rd) and installation of different libraries can take place, followed by the publication of the BG/Q access information on the WEB pages.
We will work to provide access for the users as soon as possible. Sorry for the inconvenience.

Blue Gene/Q at JSC

JUQUEEN is an IBM Blue Gene/Q system which will be installed and start operation at JSC in 2012. The first four racks will be installed in April 2012 followed by four further racks in June and a larger extension in October 2012.

Each rack contains 32 node boards with 32 compute nodes each. A node consists of a processor comprising 16 1.6 GHz 64 bit IBM PowerPC A2 cores for the execution of user applications, one additional core is used for the operating system. Every core can execute four processes/threads (fourfold Simultaneous MultiThreading, SMT) and has a quad floating point unit (FPU) which can execute four double-precision Single Instruction Multiple Data (SIMD) Fused Multipy-Add operation (FMA) or two complex SIMD FMA per cycle. The maximal performance of the processor (node) is 204.8 GFlop/s.

In order to use this architecture efficiently a hybrid parallelization strategy is necessary in general (e.g. MPI/OpenMP or MPI/Pthreads), especially since the main memory is limited to 16 GB per node (or 256 MB per process/thread with 4 processes/threads per core).

IBM Blue Gene/Q Characteristics

  • Scales to 512 racks, achieving up to 100 PF at peak performance.
  • Integrated 5D torus provides tremendous bisection bandwidth.
  • Quad floating point unit (FPU) for 4-wide double precision FPU SIMD and 2-wide complex SIMD allows for higher single thread performance for some applications.
  • “Perfect” prefetching for repeated memory reference patterns in arbitrarily long code segments achieves higher single thread performance for some applications.
  • Multiversioning cache with transactional memory eliminates the need for locks; and speculative execution allows OpenMPthreading with data dependencies.
  • Atomic operations, pipelined at L2 with low latency even under high contention, provide a faster handoff for OpenMP work.
  • A wake-up unit allows SMT threads to sleep while waiting for an event and avoids register-saving overhead.
  • A 17th core manages OS related tasks thus reducing OS related noise.


    Processor
    IBM PowerPC® A2 1.6 GHz, 16 cores per node
    Memory16 GB SDRAM-DDR3 per node (1333 MTps)
    Networks5D Torus — 40 GBps; 2.5 μsec latency (worst case)
    Collective network — part of the 5D Torus; collective logic operations supported
    Global Barrier/Interrupt — part of 5D Torus, PCIe x8 Gen2 based I/O
    1 GB Control Network — System Boot, Debug, Monitoring
    I/O Nodes (10 GbE or InfiniBand) 16-way SMP processor; configurable in 8,16 or 32 I/O nodes per rack
    Operating systemsCompute nodes — lightweight proprietary kernel
    Power
    Typical 80 kW per rack (estimated); maximum 100 kW per rack;
    Cooling90% water cooling (18-25°C,demineralized);
    10% air cooling

Servicemeu