Search

link to homepage

Institute for Advanced Simulation (IAS)

Navigation and service


FAQs about JUROPA/HPC-FF

Error Messages on JUROPA/HPC-FF

General FAQs

I have deleted some critical files by mistake, is it possible to recover them?

Files in your home directory are backed up on tape. To recover the deleted files, please do the following:

  1. Log on to one of the Juropa-GPFS-Nodes:
    ssh -X <userid>@juropagpfs.fz-juelich.de
  2. Start the backup recovery routine:
    adsmback
  3. On the prompt, select home as the desired target:
    home
  4. A graphical panel will pop up. Select the function restore from the Backup panel.
  5. A new window will pop up, showing the Juropa file hierarchy. Open the tree File Level and herein your home directory:
    e.g.: jhome12 and then hmz29 and hmz298.
  6. You can now further refine the choice of data you want to restore and finally checkmark all files and/or subtrees you need to restore.
  7. Finally select the button Restore and decide whether to restore into the original place or to a new location.

Please note that the restore may take a couple of minutes, since the data has to be retrieved from magnetic tape.

Regular and -mt Variants of the ParaStation Modules on Juropa

Question:

Can someone help clarify the expected use of the regular and -mt variants of the parastation modules on juropa:

module whatis parastation parastation/gcc: Parastation library for parallel computing (GCC).
parastation/intel: Parastation library for parallel computing (Intel Compiler).
parastation/gcc-mt: Parastation library for multi-threaded parallel computing (GCC).
parastation/intel-mt: Parastation library for multi-threaded parallel computing (Intel Compiler).

Is the expectation that the "-mt" variants should be exclusively/preferentially used for hybrid/mixed-mode applications combining OpenMP+MPI (as seems to be suggested by the module descriptions) and/or also for pure MPI applications using non-blocking communication (where internal MPI threads are used to improve performance)?

Answer:

The -mt versions have nothing to with internal threads of the MPI-library. Instead, your first assumption is more correct. The actual purpose of the -mt versions is to support higher levels of thread-support within the MPI-library. If you look at MPI_Init_thread(3) the version without -mt supports up to MPI_THREAD_SERIALIZED while the version with -mt supports even MPI_THREAD_MULTIPLE. In principle you can use the -mt version for every application. But it might show less performance (actually, there are additional locks to be handled in this version) and it is not well-tested. Thus, it is suggested to use the version without -mt for applications that use at most the MPI_THREAD_SERIALIZED level of thread support of MPI and the version with -mt only for applications MPI_THREAD_MULTIPLE.

Performance Breakdowns on higher Core Numbers

In case you would like to optimize the performance of your application or you observe performance breakdowns on higher core numbers please test the following scenario:

The variable PSP_ONDEMAND influences the creation of MPI connections. If you change the value of this variable into PSP_ONDEMAND=1 within your batch script, then the connections will be created dynamically when they are used the first time. We observed a performance increase of several applications with these dynamic connections.
Our recommendation is to perform a test run with PSP_ONDEMAND=1 and to compare the results to the runs without this specification.

Attention: If you have all-to-all communication in your application, PSP_ONDEMAND=1 might not be possible, see Using dynamic memory allocation for MPI connections.

My scp file transfer crashes with time limit. What can I do?

File transfer with scp (ssh secure copy) may consume significant amounts of CPU time due to the inherent data encryption/decryption. In order to allow for the transfer of big files to and from JUROPA/HPC-FF the CPU time limit has been specifically increased on the GPFS nodes.

CPU limits
Login nodes1800 sec
GPFS nodes21600 sec

For this reason, it is strongly recommended to use the GPFS nodes for file transfer instead of the Login nodes.

Example:

scp <userID>@juropagpfs.fz-juelich.de:<source file> <destination file>

Does ParaStation-MPI support MPI_THREAD_MULTIPLE?

The default version of ParaStation-MPI installed on JUROPA/HPC-FF does not support MPI_THREAD_MULTIPLE, i.e. it is not possible that multiple threads may call MPI, with no restriction. This functionality is given by a special version of ParaStation-MPI that can be used through the invokation of the corresponding module:

  1. module load parastation/intel-mt (supports MPI_THREAD_MULTIPLE together with the Intel compiler)
  2. module load parastation/gcc-mt (supports MPI_THREAD_MULTIPLE together with GCC)

In order to avoid conflicts, ensure that at the relevant time only one MPI version is loaded.

For what do I need the module tool?

On JUROPA/HPC-FF, general purpose applications and libraries are made available to users through the use of the module command. The user's environment in the current shell will be updated so that the software under consideration can be used. To get an overview of the modules available on JUROPA/HPC-FF type module avail on the command line. Further useful commands are:

CommandDescription
module load <module>Enables the use of the corresponding software package
module listPrints out a list of loaded modules
module help <module>Gives some information about the package under consideration
module unload <module>Opposite of module load <module>. Some software packages provoke conflicts if several versions are installed at the same time, so it might make sense to unload versions that are not needed for the moment
module show <module>Shows information about the location of the software and variables that will be set by invoking this <module>


How can I include Fortran subroutines in C programs?

In order to use some general Fortran subroutines in C programs you have to include the corresponding libraries.

Just add

-lifcore -lifport

to your link command.

How can I activate Turbo Mode on the Nehalem processors of Juropa/HPC-FF?

Turbo Mode makes it possible to automatically overclock the cores under certain conditions (see: Intel Turbo Boost Technology). The standard frequency of 2.933 GHz can be increased to a maximum value of roughly 3.2 GHz.
The following command enables Turbo Mode on the cores of the reserved compute node:

msub -l nodes=1:ppn=8:turbomode

Information about the actually applied clock frequency can be obtained from the following file of the corresponding node:

/sys/devices/system/cpu/cpu?/cpufreq/cpuinfo_max_freq

PSIlogger: Timeout: Not all clients joined the first pmi barrier ...

mpiexec along with -x can be used for the export of all environment variables to the processes spawned by mpiexec. Unfortunately, this strategy might provoke the error given in the headline in dependence of the amount of variables which are exported. Instead, it is recommended to export the needed environment variables by the option --exports. An example can be found here: Quick Introduction

Intel Compiler 12.0.3: ld: cannot find -lmkl_lapack

Starting with Intel Compilers 12.0.3 the LAPACK routines are no longer in a separate library mkl_lapack but in mkl_intel_lp64. If your Makefile contains -lmkl_lapack you will get the error message ld: cannot find -lmkl_lapack. You can just omit the -lmkl_lapack and linking will work as expected.

ipo: warning #11010: file format not recognized ...

Object files generated by the Intel compilers using the option -ipo contain additional information for the compiler/linker in order to perform code optimizations. The message ipo: warning #11010: file format not recognized for ..., possible linker script occurs if the gnu command ar is used to build static libraries from such object files. To avoid this warning, please use the Intel tool

xiar

instead of ar.

Important: If you ignore this warning, the compiler cannot find the corresponding object files and finally the linker will abort with an unresolved symbol error.

Error:Connecting ... failed : Invalid exchange - Protocol driver not attached

The following error messages during batch job execution may hint to an out-of-memory condition:

Error:Connecting 10.1.22.23:51388 to 10.1.17.42:50020 (rank 1476 to 4058) failed : Invalid exchange
Error:Connecting 10.1.21.49:59181 to 10.1.16.50:56717 (rank 1940 to 4520) failed : Protocol driver not attached

The given IP addresses, port and rank numbers may vary from case to case.
In order to solve the problem, try one of the solutions given in chapter Memory Optimisation:

How to generate and upload ssh keys?

In order to access the JSC computer systems you need to generate an ssh key pair. This pair consists of a public and a private part. Here we briefly describe how to generate and upload such a pair.

On Linux/UNIX

In order to create a new ssh key pair login to your local machine from where you want to connect to the JSC computer systems. Open a shell and use the following command

ssh-keygen -b 2048 -t rsa

You are asked for a file name and location where the key should be saved. Unless you really know what you are doing, please simply take the default by hitting the enter key. This will generate the ssh key in the .ssh directory of your home directory ($HOME/.ssh).
Next, you are asked for a passphrase. Please, choose a secure passphrase. It should be at least 8 characters long and should contain numbers, letters and special characters like !@#$%^&*().

Important: You are NOT allowed to leave the passphrase empty!

You will be asked to upload the public part of your key ($HOME/.ssh/id_rsa.pub) on the JSC web site when you apply for an account. You must keep the private part ($HOME/.ssh/id_rsa) confidential.

Important: Do NOT remove it from this location and do NOT rename it!

You will be notified by email once your account is created and your public key is installed. To login, please use

ssh <yourid>@<machine>.fz-juelich.de

where 'yourid' is your user id on the JSC system 'machine' (i.e. you have to replace 'machine' by the corresponding JSC system). You will be prompted for your passphrase of the ssh key which is the one you entered when you generated the key (see above).

On Windows

You can generate the key pair using for example the PuTTYgen tool, which is provided by the PuTTy project. Start PuTTYgen and choose SSH-2 RSA at the bottom of the window and set the 'number of bits in the generated key' to 2048 and press the 'Generate' button.

PuTTYgen will prompt you to generate some randomness by moving the mouse over the blank area. Once this is done, a new public key will be displayed at the top of the window.

Enter a secure passphrase. It should be at least 8 characters long and should contain numbers, letters and special characters like !@#$%^&*().

Important: You are NOT allowed to leave the passphrase empty!

Save the public and the private key. We recommend to use 'id_rsa.pub' for the public and 'id_rsa' for the private part.

You will be asked to upload the public part of your key (id_rsa.pub) on a JSC web site when you apply for an account. You must keep the private part (id_rsa) confidential.

You will be notified by email once your account is created and your public key is installed. To login, please use an ssh client for Windows, use authentication method 'public-key', import the key pair you have generated above and login to the corresponding JSC system with your user id. If you are using the PuTTy client you can import the key in the configuration category 'Connection', subcategory 'ssh' -> Auth. Once this is done you will be prompted for your passphrase of the ssh-key which is the one you entered when you generated the key (see above).

Adding additional keys

If you would like to connect to your account from more than one computer, you can create and use additionals pairs of public and private keys:

After creating a pair of public/private keys there are two ways of installing the public key on the target machine:

Method 1 (Linux/Mac):

Use the ssh-copy-id command to simultaneously upload and add the public key file 'public_key.pub' to the account 'user' on the target machine 'target':

ssh-copy-id -i public_key.pub user@targetmachine

Please refer to the man-page of ssh-copy-id for further information.

Method 2 (all operating systems):

ii) upload the public key file to your account at the HPC-target system

ii-a) In case the public key was created under Windows (e.g. in Putty) it has to be converted. This is done on the target HPC-system by the command

ssh-keygen -i -f original_public_key_file.pub > new_public_key_file.pub

iii) open the (new) keyfile and copy the whole line

iv) append the line as a new line to the file ~/.ssh/authorized_keys

v) Make sure the private key sits in the correct place on your private computer.

Replace SSH Key

In case the ssh key has to be replaced, use the following link: Upload of ssh-key

Note: This will replace ALL public keys by the new public key. If you use more than one key pair you will have to add your additional public keys as described above.


Servicemeu

Homepage