Search

link to homepage

Institute for Advanced Simulation (IAS)

Navigation and service


General FAQs about JUQUEEN

Error messages and performance hints on JUQUEEN

Data questions

General FAQs about JUQUEEN

What are the conditions for temporary data?

For temporary user data it is recommeded to use $WORK instead of /tmp, because $WORK is a lot bigger. (Data on $WORK will be held for 90 days and is not backed up.)

BlueGene applications are not able to access /tmp. Jobs trying this will be terminated.
If a XL Fortran application program creates a temporary file at run time with STATUS='SCRATCH', by default, these files are placed in the directory /tmp. To avoid this redefine $TMPDIR:
runjob --exe <myprog> --envs TMPDIR=$WORK/<dir>

Also do not use /tmp on the Front-End-Nodes!
/tmp is very small and data will be held only for 7 days.

How can I link Fortran subroutines into my C program?

To link Fortran subroutines either from libraries like essl or lapack or as parts of your own code into a C main program you have to link the following additional libraries after the Fortran routines and all Fortran libraries in your link statement:

-L${XLFLIB_FZJ} -lxl -lxlopt -lxlf90_r -lxlfmath \
-L${XLSMPLIB_FZJ} -lxlomp_ser -lpthread

FZJ has introduced the environment variables XLFLIB_FZJ and XLSMPLIB_FZJ as pointers to the recent compiler version, so that makefiles can be kept independent of the compiler changes.

How can a soft limit for the wall clock time be used?

At the moment there is no way to use the soft limit.
The signal, which is send by LoadLeveler on the front-end node is not routed to the BlueGene application.

The only, whereas not fully adequate, alternative is to check the user time from within the application and estimate manually the remaining time.

Why should jobs write regular checkpoints?

The enhanced complexity of the new-generation supercomputers at JSC increases the probability that a job might be affected by a failure. Therefore, we strongly encourage all users of these systems to write regular checkpoints from their applications to avoid losses of CPU time when a job is aborted. There will be no refund of CPU time in the case of a failed job!

Tip: Besides checkpointing, jobs with a time limit of less than the maximum allowed hours might have a better turnaround time on JUQUEEN because they can be used to optimally fill the machine while it is being prepared for regular maintenance slots or full machine runs.

How to generate and upload ssh keys?

In order to access the JSC computer systems you need to generate an ssh key pair. This pair consists of a public and a private part. Here we briefly describe how to generate and upload such a pair.

On Linux/UNIX

In order to create a new ssh key pair login to your local machine from where you want to connect to the JSC computer systems. Open a shell and use the following command

ssh-keygen -b 2048 -t rsa

You are asked for a file name and location where the key should be saved. Unless you really know what you are doing, please simply take the default by hitting the enter key. This will generate the ssh key in the .ssh directory of your home directory ($HOME/.ssh).
Next, you are asked for a passphrase. Please, choose a secure passphrase. It should be at least 8 characters long and should contain numbers, letters and special characters like !@#$%^&*().

Important: You are NOT allowed to leave the passphrase empty!

You need to upload the public part of your key ($HOME/.ssh/id_rsa.pub) via the JSC portal JuDoor. You must keep the private part ($HOME/.ssh/id_rsa) confidential.

Important: Do NOT remove it from this location and do NOT rename it!

You will be notified by email once your account is created. You can then upload ssh keys in JuDoor which will become active after a short amount of time. To login, please use

ssh <yourid>@<machine>.fz-juelich.de

where 'yourid' is your user id on the JSC system 'machine' (i.e. you have to replace 'machine' by the corresponding JSC system). You will be prompted for your passphrase of the ssh key which is the one you entered when you generated the key (see above).

On Windows

You can generate the key pair using for example the PuTTYgen tool, which is provided by the PuTTy project. Start PuTTYgen and choose SSH-2 RSA at the bottom of the window and set the 'number of bits in the generated key' to 2048 and press the 'Generate' button.

PuTTYgen will prompt you to generate some randomness by moving the mouse over the blank area. Once this is done, a new public key will be displayed at the top of the window.

Enter a secure passphrase. It should be at least 8 characters long and should contain numbers, letters and special characters like !@#$%^&*().

Important: You are NOT allowed to leave the passphrase empty!

Save the public and the private key. We recommend to use 'id_rsa.pub' for the public and 'id_rsa' for the private part.

The correct public key for the upload can be directly found within the puttygen tool (the .pub file, which can be stored by puttygen uses a different format):

puttygen

You need to upload the public part of your key ($HOME/.ssh/id_rsa.pub) via the JSC portal JuDoor. You must keep the private part (id_rsa) confidential.

You will be notified by email once your account is created. You can then upload ssh keys in JuDoor which will become active after a short amount of time. To login, please use an ssh client for Windows, use authentication method 'public-key', import the key pair you have generated above and login to the corresponding JSC system with your user id. If you are using the PuTTy client you can import the key in the configuration category 'Connection', subcategory 'ssh' -> Auth. Once this is done you will be prompted for your passphrase of the ssh-key which is the one you entered when you generated the key (see above).

Adding additional keys

If you would like to connect to your account from more than one computer, you can create and use additionals pairs of public and private keys:

After creating a pair of public/private keys, please upload it again via JuDoor and don't select the checkbox "Remove all other existing public keys.".

Replace ssh keys

If you would like to put new keys on the system to replace the existing keys, please upload the new key JuDoor and select the checkbox "Remove all other existing public keys.".

Connection problem after creating a new key

It can happen that the new key is not loaded automatically by your local SSH agent (you will receive a permission denied error after you try to connect to the JSC computer system). To update your SSH agent manually you can use the command:

ssh-add <your private key-file>

How do I check how much memory my application is using?

Integrating the routine below will allow you to track the memory usage of your application:

#include <stdio.h>
#include <stdlib.h>

#include <spi/include/kernel/memory.h>

void print_memusage()
{
uint64_t shared, persist, heapavail, stackavail, stack, heap, guard, mmap;

Kernel_GetMemorySize(KERNEL_MEMSIZE_GUARD, &guard);
Kernel_GetMemorySize(KERNEL_MEMSIZE_SHARED, &shared);
Kernel_GetMemorySize(KERNEL_MEMSIZE_PERSIST, &persist);
Kernel_GetMemorySize(KERNEL_MEMSIZE_HEAPAVAIL, &heapavail);
Kernel_GetMemorySize(KERNEL_MEMSIZE_STACKAVAIL, &stackavail);
Kernel_GetMemorySize(KERNEL_MEMSIZE_STACK, &stack);
Kernel_GetMemorySize(KERNEL_MEMSIZE_HEAP, &heap);
Kernel_GetMemorySize(KERNEL_MEMSIZE_MMAP, &mmap);
#if 0
printf("Allocated heap: %.2f MB, avail. heap: %.2f MB\n", double(heap)/(1024*1024), double(heapavail)/(1024*1024));
printf("Allocated stack: %.2f MB, avail. stack: %.2f MB\n", double(stack)/(1024*1024), double(stackavail)/(1024*1024));
printf("Memory: shared: %.2f MB, persist: %.2f MB, guard: %.2f MB, mmap: %.2f MB\n", double(shared)/(1024*1024), double(persist)/(1024*1024), double(guard)/(1024*1024), double(mmap)/(1024*1024));
#else
printf("MEMSIZE heap: %.2f/%.2f stack: %.2f/%.2f mmap: %.2f MB\n", (double)heap/(1024*1024), (double)heapavail/(1024*1024), (double)stack/(1024*1024), (double)stackavail/(1024*1024), (double)mmap/(1024*1024));
printf("MEMSIZE shared: %.2f persist: %.2f guard: %.2f MB\n", (double)shared/(1024*1024), (double)persist/(1024*1024), (double)guard/(1024*1024));
#endif
}

How can core dumps be disabled or limited?

Core dumps are enabled on JUQUEEN by default.

Due to the fact that writing core files from thousands of nodes takes (too) much time, the generating of core files may be suppressed or limited.

How to disable core files?

The BG_COREDUMPDISABLED environment variable must be set to 1 and exported to the runjob environment:

export BG_COREDUMPDISABLED=1

runjob --exe <filename> --exp-env BG_COREDUMPDISABLED

How to limit the number of core files?

The BG_COREDUMPDISABLED environment variable must be set to the number of requested core files and exported to the runjob environment:

export BG_COREDUMPMAXNODES=<n>

runjob --exe <filename> --exp-env BG_COREDUMPMAXNODES

How to read core files?

Core files are plain text files that include traceback information in hexadecimal.

To read and convert the hexadecimal addresses the tool addr2line may help.
For more information use

addr2line -h

(Compilation should have included option -g)

SSH access problem after SSH client update

ia In OpenSSH 7.0 the support for ssh-dss host and user keys was disabled by default. If you are using a ssh-dss key (the public key starts with "ssh-dss") you will not be able to login to the SC systems by using the default settings after updating your local SSH installation.
In this case a verbose SSH-run

ssh -v <user>@<system>

will display you the following message:

debug1: Skipping ssh-dss key /.../.ssh/id_dsa for not in PubkeyAcceptedKeyTypes

To fix this problem, please upload a new key (using the "ssh-rsa" key format) via JuDoor.

How to generate and upload ssh keys?

In order to access the JSC computer systems you need to generate an ssh key pair. This pair consists of a public and a private part. Here we briefly describe how to generate and upload such a pair.

On Linux/UNIX

In order to create a new ssh key pair login to your local machine from where you want to connect to the JSC computer systems. Open a shell and use the following command

ssh-keygen -b 2048 -t rsa

You are asked for a file name and location where the key should be saved. Unless you really know what you are doing, please simply take the default by hitting the enter key. This will generate the ssh key in the .ssh directory of your home directory ($HOME/.ssh).
Next, you are asked for a passphrase. Please, choose a secure passphrase. It should be at least 8 characters long and should contain numbers, letters and special characters like !@#$%^&*().

Important: You are NOT allowed to leave the passphrase empty!

You need to upload the public part of your key ($HOME/.ssh/id_rsa.pub) via the JSC portal JuDoor. You must keep the private part ($HOME/.ssh/id_rsa) confidential.

Important: Do NOT remove it from this location and do NOT rename it!

You will be notified by email once your account is created. You can then upload ssh keys in JuDoor which will become active after a short amount of time. To login, please use

ssh <yourid>@<machine>.fz-juelich.de

where 'yourid' is your user id on the JSC system 'machine' (i.e. you have to replace 'machine' by the corresponding JSC system). You will be prompted for your passphrase of the ssh key which is the one you entered when you generated the key (see above).

On Windows

You can generate the key pair using for example the PuTTYgen tool, which is provided by the PuTTy project. Start PuTTYgen and choose SSH-2 RSA at the bottom of the window and set the 'number of bits in the generated key' to 2048 and press the 'Generate' button.

PuTTYgen will prompt you to generate some randomness by moving the mouse over the blank area. Once this is done, a new public key will be displayed at the top of the window.

Enter a secure passphrase. It should be at least 8 characters long and should contain numbers, letters and special characters like !@#$%^&*().

Important: You are NOT allowed to leave the passphrase empty!

Save the public and the private key. We recommend to use 'id_rsa.pub' for the public and 'id_rsa' for the private part.

The correct public key for the upload can be directly found within the puttygen tool (the .pub file, which can be stored by puttygen uses a different format):

puttygen

You need to upload the public part of your key ($HOME/.ssh/id_rsa.pub) via the JSC portal JuDoor. You must keep the private part (id_rsa) confidential.

You will be notified by email once your account is created. You can then upload ssh keys in JuDoor which will become active after a short amount of time. To login, please use an ssh client for Windows, use authentication method 'public-key', import the key pair you have generated above and login to the corresponding JSC system with your user id. If you are using the PuTTy client you can import the key in the configuration category 'Connection', subcategory 'ssh' -> Auth. Once this is done you will be prompted for your passphrase of the ssh-key which is the one you entered when you generated the key (see above).

Adding additional keys

If you would like to connect to your account from more than one computer, you can create and use additionals pairs of public and private keys:

After creating a pair of public/private keys, please upload it again via JuDoor and don't select the checkbox "Remove all other existing public keys.".

Replace ssh keys

If you would like to put new keys on the system to replace the existing keys, please upload the new key JuDoor and select the checkbox "Remove all other existing public keys.".

Connection problem after creating a new key

It can happen that the new key is not loaded automatically by your local SSH agent (you will receive a permission denied error after you try to connect to the JSC computer system). To update your SSH agent manually you can use the command:

ssh-add <your private key-file>

Error messages and performance hints on JUQUEEN

Why is MPI_Allgatherv (using non-contiguous memory) so slow?

Using non-contiguous memory for MPI_Allgatherv potentially results in
significant performance loss. The easiest way to fix this is the following

export PAMID_COLLECTIVE_ALLGATHERV=GLUE_BCAST

runjob --exp-env PAMID_COLLECTIVE_ALLGATHERV ...

This handles non-contiguous memory for MPI_Allgatherv faster than the
algorithm which is used by default. For larger scales the performance is
still slower than for contiguous memory so you might want to avoid using
non-contiguous memory.

How can I omit the emergency warning when starting emacs?

When starting emacs on JUQUEEN the following warning message may appear:

Emergency (alloc): Warning: past 95% of memory limit

This warning can be ignored. To avoid this notification, you need to include the following line in the file $HOME/.emacs (you need to create this file if it does not exist):

(setq warning-suppress-types '((alloc)))

Please include all parentheses in this line.

What does the error message "Load failed on Rxx-xx-xxx: Generating static TLB map for application failed, errno 0 " mean?

An application was loaded and the CNK (compute node kernel) was not able to generate a physical map for it. Usual reason is that the application is too big for 16GB of memory.

What do you get when you run size <executable name>?

If you take the data segment and multiply it by the number of processes per node, does it exceed (or get close to) 16GB?

If this is the case then you will need to reduce the executable size. Maybe there are some modules in your code that are not required and do not need to be linked to the executable? Another possibility would be to decrease the number of ranks per node to have more memory available.

Data questions

What file system to use for different data?

In multiple GPFS file systems for different types of user data. Each file system has its own data policies.

  • $HOME
    Acts as repository for the user’s personal data like the SSH key. There is a separate HOME folder for each HPC system and a shared folder which pointed on all systems to the same directory. Data within $HOME are backed up by TSM, see also

  • $SCRATCH
    Is bound to a compute project and acts as a temporary storage location with high I/O bandwidth. If the application is able to handle large files and I/O demands, $SCRATCH is the right file system to place them. Data within $SCRATCH is not backed up and daily cleanup is done.

    • Normal files older than 90 days will be purged automatically. In reality modification and access date will be taken into account, but for performance reasons access date is not set automatically by the system but can be set by the user explicitly with
      touch -a <filename>.
      Time stamps that are recorded with files can be easily listed by
      stat <filename>.
    • Empty directories, as they will arise amongst others due to deletion of old files, will be deleted after 3 days. This applies also to trees of empty directories which will be deleted recursively from bottom to top in one step.
  • $PROJECT
    Data repository for a compute project. It's lifetime is bound to the project lifetime. Data are backed up by TSM.

  • $FASTDATA
    Belongs to a data project. This file system is bandwidth optimized (similar to $SCRATCH), but data are persistent and internally backed up via snapshots.

  • $DATA
    Belongs to a data project. This file system is designed to store a huge amount of data on disk based storage. The bandwidth is moderate. The file-system internal backup is realized with the GPFS snapshot feature. For more information, look at

  • $ARCHIVE
    Is bound to a data project and acts as storage for all files not in use for a longer time. Data are migrated to tape storage by TSM-HSM. It is recommended to use tar-files with a minimum size of of multiple Gigabytes and maximum of 8 TB. The background is that recalling/restoring files from tape is much more efficient using only a few large datastreams than thousends of small data streams. See also

All GPFS file systems are managed by quotas for disk space and/or number of files. See also

What data quotas do exist and how to list usage?

For all data repositories the disk quota managing is enabled. The values are set to default values (defined by JSC) or depend on special requirements by the projects.


Default data quota per user/project within GPFS file systems

File System

Disk Space

Number of Files

Soft LimitHard Limit Soft LimitHard Limit
$HOME10 GB11 GB40.00044.000
$SCRATCH90 TB95 TB4 Mio4.4 Mio
$PROJECT16 TB17TB3 Mio3.1 Mio
$FASTDATAas granted to projectsoft limit + up to 10% additional as granted to projectsoft limit + up to 10% additional
$DATAas granted to projectsoft limit + up to 10% additionalas granted to projectsoft limit + up to 10% additional
$ARCHIVEas granted to projectsoft limit + up to 10% additionalas granted to projectsoft limit + up to 10% additional


File size limit

Although the file size limit on operation system level e.g. at JUWELS or JURECA is set to unlimited (ulimit -f) the maximum file size can only be the GPFS group quota limit for the corresponding file system. The actual limits can be listed by jutil.

List data quota and usage by project or user

Members of a group/project can display the hard limits, quotas (soft limit) and usage by each user of the project using the jutil command.

jutil project dataquota -p <project name>

The quota information for the users are updated every 8 hours.

Recommendation for users with a lot of small files

Users with applications that create a lot of relatively small files should reorganize the data by collecting these files within tar-archives using the

tar -cvf archive-filename ...

command. The problem is really the number of files (inodes) that have to be managed by the underlying operating system and not the space they occupy in total. On the other hand please keep in mind the recommendations under File size limit.

How to modify the users's environment.

When users login on an frontend node using the secure shell software a shell will be started and a set of environment variables will be exported. These are defined in system profiles. Each user can add/modify his environment by using his own profiles in his $HOME directory.

In the Jülich setup there will be a separate $HOME directory for each HPC system. Which means that the environment differs between JUWELS, JURECA; JUDAC; ... and also the user can modify his own profiles for each system separately. Therefore a skeleton .bash_profile and .bashrc are placed in each $HOME directory when a user is joined to any HPC system.

.bash_profile:
# **************************************************
# bash environment file in $HOME
# Please see:
# http://www.fz-juelich.de/ias/jsc/EN/Expertise/D...
# for more information and possible modifications...
# **************************************************
# Get the aliases and functions: Copied from Cent...
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
export PS1="[\u@\h \W]\$ "

.bashrc:
# **************************************************
# bash environment file in $HOME
# Please see:
# http://www.fz-juelich.de/ias/jsc/EN/Expertise/D...
# for more information and possible modifications...
# **************************************************
# Source global definitions: Copied from CentOS 7...
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi

User $HOME directory structureSeparate $HOME directory for each HPC system

E.g. on JUDAC user graf1 will see $HOME="/p/home/jusers/graf1/JUDAC". The profiles located here were used for login. Only the shared folder (link) points always to the same directory /p/home/jusers/graf1/shared.

Most side dependend variables are set automatically by the jutil env init command (system profile). The user can set the right core variables ($PROJECT, $ARCHIVE, ...) by using

jutil env activate -p <project>

For more information look at the jutil command usage.

How to make the currently enabled budget visible:

If a user has to change his budget account during a login session it might be helpful to see the currently set budget account in his prompt to be sure to work on the correct budget.Therefore one should replace the current "export PS!=..." line in .bash_profile by:

prompt() {
PS1="[${BUDGET_ACCOUNTS:-\u}@\h \W]\$ "
}
PROMPT_COMMAND=prompt

This results in the following behaviour:

[user1@juwels07 ~]$ jutil env activate -p chpsadm
[hpsadm@juwels07 ~]$ jutil env activate -p cslfse
[slfse@juwels07 ~]$

How can I recall migrated data?

Normally migrated files are automatically recalled from TSM-HSM tape storage when the file is accessed on the login nodes of the HPC systems (e.g. JUWELS, JURECA, ...) or the Data Access System (JUDAC).

For an explicit recall the native TSM-HSM command dsmrecall is not available. Please use

tail <filename>
or:
head <filename>

to start the recall process. These commands will not change any file attribute and the migrated version of the file as well as the backup version stay valid.

It is strongly recommended NOT to use

touch <filename>

because this changes the timestamp of the file, so a new backup copy must be created and the file has to be migrated again. These are two additional processes that waste compute resources, if the file is used read only by further processing.

How can I see which data is migrated?

There are two file systems which hold migrated data: /arch and /arch2

  • These are so called archive file systems.
  • In principle all data in the file systems will be migrated to TSM-HSM tape storage in tape libraries.
  • Data is copied to TSM backup storage prior to migration.
  • Data are not quoted by storage but by the number of files per group/project. This is done because UNIX is still not able to handle millions of files in a file system with an acceptable performance.

The TSM-HSM native command dsmls, which shows if a file is migrated, is not available on any HPC system (e.g. JUWELS, JURECA, ...) nor on the Data Access System (JUDAC). This command is only supported on the TSM-HSM node of the JUST storage cluster, that hosts the file systems for the HPC systems. However JUST is not open for user access.

Please use

ls -ls [mask | filename]

to list the files. Migrated files can be identified by a block count of 0 in the first column (-s option) and an arbitrary number of bytes in the sixth column (-l option).

0 -rw-r----- 1 user group 513307 Jan 22 2008 log1
0 -rw-r----- 1 user group 114 Jan 22 2008 log2
0 -rw-r----- 1 user group 273 Jan 22 2008 log3
0 -rw-r----- 1 user group 22893504 Jan 23 2008 log4

How to restore files?

How to restore user or project data

All file systems expect for $SCRATCH provide data protection mechanisms based either on the IBM Spectrum Protect (TSM) or the Spectrum Scale (GPFS) snapshot technology.

Especially for TSM only the JUDAC system is capable of retrieving lost data from the backup by using the command line tool adsmback:

adsmback -type=<target repository>

Don't use the native dsmj-command which will not show any home data.

$HOME - Users personal data

All files within the users home directories ($HOME) are automatically backed up by TSM (Tivoli Storage Manager) function. To restore a file, use

adsmback -type=home &

on JUDAC.

This command grants access to the correct backup data of the user's assigned home directory.

Follow the GUI by selecting:

Restore -> View -> Display active/inactive files
File level -> p> home -> jusers -> userid -> ...
Select files or directories to restore
Press [Restore] button

If the data should be restored to original location then choose within the Restore Destination window

  • Original location

Otherwise select:

  • Following location + <path> + Restore complete path

$PROJECT - Compute project repository

All files within the compute project directories ($PROJECT) are automatically backed up by TSM (Tivoli Storage Manager) function. To restore a file, use

adsmback -type=project &

on JUDAC.

This command grants access to the correct backup data of the project repository.

Follow the GUI by selecting:

Restore -> View -> Display active/inactive files
File level -> p> project -> group -> ...
Select files or directories to restore
Press [Restore] button

If the data should be restored to original location then choose within the Restore Destination window

  • Original location

Otherwise select:

  • Following location + <path> + Restore complete path

$FASTDATA - Data project repository (bandwidth optimized)

The files within the data project directories ($FASTDATA) are not externally backed up to tape. Instead, an internal backup based on the snapshot feature from the file system (GPFS) is offered. The difference between the TSM backup and the snapshot based backup is, that TSM act on file changes while snapshots save the state at a certain point in time. Right now the following snapshots are configured:

daily backuplast daytoday, just after midnight
weekly backuplast weekevery Sunday, just after midnight
monthly backuplast three retentionevery 1st day of month, just after midnight

The snapshots can be found in a special subdirectory of the project repository. Go to

cd $FASTDATA/.snapshots

and list contents

/p/fastdata/jsc/.snapshots> ls
daily-20181129
daily-20181130
daily-20181203
weekly-20181118
weekly-20181125
weekly-20181202
monthly-20181001
monthly-20181101
monthly-20181201

In the subdirectory <type>-<YYYYMMDD> the file version which was valid at date DD.MM.YYYY can be retrieved using the same path as the actual file is placed in the $FASTDATA repository.

Due to the fact that the snapshot is part of the file system, the data restore can be performed on any system where it is mounted.

$DATA - Data project repository (large capacity)

The files within the data project directories ($DATA) are not externally backed up to tape. Instead, an internal backup based on the snapshot feature from the file system (GPFS) is offered. The difference between the TSM backup and the snapshot based backup is, that TSM act on file changes while snapshots save the state at a certain point in time. Right now the following snapshots are configured:

daily backuplast three retentiontoday, just after midnight
weekly backuplast three retentionevery Sunday, just after midnight
monthly backuplast three retentionevery 1st day of month, just after midnight

The snapshots can be found in a special subdirectory of the project repository. Go to

cd $DATA/.snapshots

and list contents

/p/largedata/jsc> ls
daily-20181129
daily-20181130
daily-20181203
weekly-20181118
weekly-20181125
weekly-20181202
monthly-20181001
monthly-20181101
monthly-20181201

In the subdirectory <type>-<YYYYMMDD> the file version which was valid at date DD.MM.YYYY can be retrieved using the same path as the actual file is placed in the $DATA repository.

Due to the fact that the snapshot is part of the file system, the data restore can be performed on any system where it is mounted.

$ARCHIVE - The Archive data repository

All files within the user's archive directory ($ARCHIVE) for long term storage are automatically backed up by TSM (Tivoli Storage Manager) function. To restore a file, use

adsmback [-type=archive] &

on JUDAC.

This command grants access to the correct backup data of the project's assigned archive directory.

Follow the GUI by selecting:

Restore -> View -> Display active/inactive files
File level -> archX -> group -> ...
Select files or directories to restore
Press [Restore] button

If the data should be restored to original location then choose within the Restore Destination window:

  • Original location

Otherwise select

  • Following location + <path> + Restore complete path
How to share files by using ACLs?

Linux file permission define the access rights to read, write or execute (rwx) files and directory but is limited to one user, one group and all others. ACLs (Access Control Lists) allows a more fine-grained assignment of access rights. The owner of a file/directory can define specific rights for other users and groups.

Linux commands to manage ACLs

- command to list ACLs of a file/directory:

getfacl <file/directory>

- Give user john1 read and write control to file example.txt. Also give user lucy1 the right to read this file:

setfacl -m u:john1:rw example.txt
setfacl -m u:jim1:r example.txt

# file: example.txt
# owner: smith1
# group: cjsc
user::rw-
user:john1:rw-
user:lucy1:r--
group::---
mask::rw-
other::---

- remove user john1 ACLs on example.txt:

setfacl -x u:john1 example.txt

# file: example.txt
# owner: smith1
# group: cjsc
user::rw-
user:lucy1:r--
group::---
mask::rw-
other::---

- Allow users from group zam change to directory share:

setfacl -m g:zam:x share/

# file: share
# owner: smith1
# group: cjsc
user::rwx
group::---
group:zam:--x
mask::rw-
other::---

- remove all ACLs from directory share::

setfacl -b share

# file: share
# owner: smith1
# group: cjsc
user::rwx
group::---
other::---

Further information (e.g. set ACLs recursively, setting default ACLs, inherit ACLs, ...) can be found in the manual pages.

Which files have an access control list?

The command

ls -l

will show a "+" for every file that has ACL set, eg.

drwx------+ 2 john1 cjsc 32768 Feb 21 09:25 share

How to avoid multiple SSH connections on data transfer?

When transferring multiple files, it can be problematic to use a separate SSH connection for each transfer operation. The network firewall can block a large amount of independent simultaneous SSH connections. There are different options to avoid multiple SSH connections:

Use rsync or use scp with multiple files:

rsync -avhzP local_folder/ username@host:remote_folder

rsync only copies new or changed files, this will reserve transfer bandwith.

scp -r local_folder/ username@host:remote_folder

will copy local_folder recursively

Use tar-container to transfer less files

Creating a tar file and transfer it can be much faster compared to transferring all files separately:

tar -cf tar.file local_folder

The tar file creation, transmission and extraction process can also be done on the fly:

tar -c local_folder/ | ssh username@host \
'cd remote_folder; tar -x'

Use shared SSH connection

Shared SSH connections allow usage of the same connection multiple times:

Open master connection:

ssh -M -S /tmp/ssh_mux_%h_%p_%r username@host

Reuse connection:

ssh -S /tmp/ssh_mux_%h_%p_%r username@host

A shared connection can also be used when using scp:

scp -o 'ControlPath /tmp/ssh_mux_%h_%p_%r' \
local_folder username@host:remote_folder

How to ensure correct group ID for Your project data?

In our usage model all compute and data projects get a dedicated data repository in our global parallel file systems. The files stored into this directory belongs to the project. Therefore all files and sub-directory have to belong to project's UNIX group. To ensure that all data automatically belongs to this group the project directory has the setGID bit in place. New files will inherit the project UNIX group by default and sub-directories will get the setGID bit, too. But users can overrule this default behavior (willingly or by accident).

To fix wrong group ownership on your files use

chown :<group> <target_file>
chown :zam /p/arch/zam/calculation/output.txt

If you have a complete directory to fix use the recursive option:

chown -R -h :zam /p/arch/zam/calculation

On $ARCHIVE the quota usage is calculated on UNIX group base. Therefore nightly a recursive chown is performed on each project directory to apply the corresponding project group.

If the setGID bit is missing on a directory use

chmod g+s <target directory>
chmod g+s /p/arch/zam/calculation

If the setGID bit is missing in a complete directory tree, use find to fix it for all sub-directories:

find /p/arch/zam/calculation -type d -exec chmod g+s {} \;


Servicemeu

Homepage