Navigation and service

FAQs about JUROPA/HPC-FF

Error Messages on JUROPA/HPC-FF

General FAQs

Performance Breakdowns on higher Core Numbers

In case you would like to optimize the performance of your application or you observe performance breakdowns on higher core numbers please test the following scenario:

The variable PSP_ONDEMAND influences the creation of MPI connections. If you change the value of this variable into PSP_ONDEMAND=1 within your batch script, then the connections will be created dynamically when they are used the first time. We observed a performance increase of several applications with these dynamic connections.
Our recommendation is to perform a test run with PSP_ONDEMAND=1 and to compare the results to the runs without this specification.

Attention: If you have all-to-all communication in your application, PSP_ONDEMAND=1 might not be possible, see Using dynamic memory allocation for MPI connections.

Regular and -mt Variants of the ParaStation Modules on Juropa

Question:

Can someone help clarify the expected use of the regular and -mt variants of the parastation modules on juropa:

module whatis parastation parastation/gcc: Parastation library for parallel computing (GCC).
parastation/intel: Parastation library for parallel computing (Intel Compiler).
parastation/gcc-mt: Parastation library for multi-threaded parallel computing (GCC).
parastation/intel-mt: Parastation library for multi-threaded parallel computing (Intel Compiler).

Is the expectation that the "-mt" variants should be exclusively/preferentially used for hybrid/mixed-mode applications combining OpenMP+MPI (as seems to be suggested by the module descriptions) and/or also for pure MPI applications using non-blocking communication (where internal MPI threads are used to improve performance)?

Answer:

The -mt versions have nothing to with internal threads of the MPI-library. Instead, your first assumption is more correct. The actual purpose of the -mt versions is to support higher levels of thread-support within the MPI-library. If you look at MPI_Init_thread(3) the version without -mt supports up to MPI_THREAD_SERIALIZED while the version with -mt supports even MPI_THREAD_MULTIPLE. In principle you can use the -mt version for every application. But it might show less performance (actually, there are additional locks to be handled in this version) and it is not well-tested. Thus, it is suggested to use the version without -mt for applications that use at most the MPI_THREAD_SERIALIZED level of thread support of MPI and the version with -mt only for applications MPI_THREAD_MULTIPLE.

How can I activate Turbo Mode on the Nehalem processors of Juropa/HPC-FF?

Turbo Mode makes it possible to automatically overclock the cores under certain conditions (see: Intel Turbo Boost Technology). The standard frequency of 2.933 GHz can be increased to a maximum value of roughly 3.2 GHz.
The following command enables Turbo Mode on the cores of the reserved compute node:

msub -l nodes=1:ppn=8:turbomode

Information about the actually applied clock frequency can be obtained from the following file of the corresponding node:

/sys/devices/system/cpu/cpu?/cpufreq/cpuinfo_max_freq

Does ParaStation-MPI support MPI_THREAD_MULTIPLE?

The default version of ParaStation-MPI installed on JUROPA/HPC-FF does not support MPI_THREAD_MULTIPLE, i.e. it is not possible that multiple threads may call MPI, with no restriction. This functionality is given by a special version of ParaStation-MPI that can be used through the invokation of the corresponding module:

  1. module load parastation/intel-mt (supports MPI_THREAD_MULTIPLE together with the Intel compiler)
  2. module load parastation/gcc-mt (supports MPI_THREAD_MULTIPLE together with GCC)

In order to avoid conflicts, ensure that at the relevant time only one MPI version is loaded.

My scp file transfer crashes with time limit. What can I do?

File transfer with scp (ssh secure copy) may consume significant amounts of CPU time due to the inherent data encryption/decryption. In order to allow for the transfer of big files to and from JUROPA/HPC-FF the CPU time limit has been specifically increased on the GPFS nodes.

CPU limits
Login nodes1800 sec
GPFS nodes21600 sec

For this reason, it is strongly recommended to use the GPFS nodes for file transfer instead of the Login nodes.

Example:

scp <userID>@juropagpfs.fz-juelich.de:<source file> <destination file>

For what do I need the module tool?

On JUROPA/HPC-FF, general purpose applications and libraries are made available to users through the use of the module command. The user's environment in the current shell will be updated so that the software under consideration can be used. To get an overview of the modules available on JUROPA/HPC-FF type module avail on the command line. Further useful commands are:

CommandDescription
module load <module>Enables the use of the corresponding software package
module listPrints out a list of loaded modules
module help <module>Gives some information about the package under consideration
module unload <module>Opposite of module load <module>. Some software packages provoke conflicts if several versions are installed at the same time, so it might make sense to unload versions that are not needed for the moment
module show <module>Shows information about the location of the software and variables that will be set by invoking this <module>


I have deleted some critical files by mistake, is it possible to recover them?

Files in your home directory are backed up on tape. To recover the deleted files, please do the following:

  1. Log on to one of the Juropa-GPFS-Nodes:
    ssh -X <userid>@juropagpfs.fz-juelich.de
  2. Start the backup recovery routine:
    adsmback
  3. On the prompt, select home as the desired target:
    home
  4. A graphical panel will pop up. Select the function restore from the Backup panel.
  5. A new window will pop up, showing the Juropa file hierarchy. Open the tree File Level and herein your home directory:
    e.g.: jhome12 and then hmz29 and hmz298.
  6. You can now further refine the choice of data you want to restore and finally checkmark all files and/or subtrees you need to restore.
  7. Finally select the button Restore and decide whether to restore into the original place or to a new location.

Please note that the restore may take a couple of minutes, since the data has to be retrieved from magnetic tape.

How can I include Fortran subroutines in C programs?

In order to use some general Fortran subroutines in C programs you have to include the corresponding libraries.

Just add

-lifcore -lifport

to your link command.

PSIlogger: Timeout: Not all clients joined the first pmi barrier ...

mpiexec along with -x can be used for the export of all environment variables to the processes spawned by mpiexec. Unfortunately, this strategy might provoke the error given in the headline in dependence of the amount of variables which are exported. Instead, it is recommended to export the needed environment variables by the option --exports. An example can be found here: Quick Introduction

ipo: warning #11010: file format not recognized ...

Object files generated by the Intel compilers using the option -ipo contain additional information for the compiler/linker in order to perform code optimizations. The message ipo: warning #11010: file format not recognized for ..., possible linker script occurs if the gnu command ar is used to build static libraries from such object files. To avoid this warning, please use the Intel tool

xiar

instead of ar.

Important: If you ignore this warning, the compiler cannot find the corresponding object files and finally the linker will abort with an unresolved symbol error.

Intel Compiler 12.0.3: ld: cannot find -lmkl_lapack

Starting with Intel Compilers 12.0.3 the LAPACK routines are no longer in a separate library mkl_lapack but in mkl_intel_lp64. If your Makefile contains -lmkl_lapack you will get the error message ld: cannot find -lmkl_lapack. You can just omit the -lmkl_lapack and linking will work as expected.

Error:Connecting ... failed : Invalid exchange - Protocol driver not attached

The following error messages during batch job execution may hint to an out-of-memory condition:

Error:Connecting 10.1.22.23:51388 to 10.1.17.42:50020 (rank 1476 to 4058) failed : Invalid exchange
Error:Connecting 10.1.21.49:59181 to 10.1.16.50:56717 (rank 1940 to 4520) failed : Protocol driver not attached

The given IP addresses, port and rank numbers may vary from case to case.
In order to solve the problem, try one of the solutions given in chapter Memory Optimisation:

How to generate and upload ssh keys?

In order to access the JSC computer systems you need to generate a SSH key pair. This pair consists of a public and a private part.

Please follow the following links to the system specifc support pages to receive information how to generate your SSH key pair and how to manage your SSH connections in general: