File Systems

File Systems

Cluster File Systems

On JUROPA/HPC-FF two cluster file system types are available:

• Lustre, available on all JUROPA/HPC-FF nodes
• GPFS, available on the GPFS login nodes only

The purpose of the first is to provide a cluster-wide file system to be used for computational work on JUROPA/HPC-FF, while the purpose of the latter is to act as an interface between JUROPA/HPC-FF and other FZJ supercomputers. Thus, GPFS allows to copy data to or from the JUGENE and JUDGE systems or to store data in the GPFS-based tape archive.

The cluster file system paths assigned to each user are defined by shell environment variables. The following table summarizes the information:

FS/VariableTypeDescriptionAccess
$HOMELustreFull path to the user's home directory inside the Lustre file systemall nodes$WORKLustreFull path to the user's home directory inside the Lustre file systemall nodes
/usr/localLustreSymbolic link to the JuRoPA software repositoryall nodes
$GPFSHOMEGPFSFull path to the user's home directory inside GPFS (e.g. on JUGENE)GPFS nodes only$GPFSWORKGPFSFull path to the GPFS scratch file system (JUGENE)GPFS nodes only
$GPFSARCHGPFSFull path to the user's GPFS archive directoryGPFS nodes only All variables are automatically set during the login process. It is highly recommended to access files always with the help of these variables. The following usage model should be kept in mind for the cluster file systems: •$HOME
Repository for small files with limited I/O bandwidth demands like source code, binaries, libraries and applications. All files are backed up on a daily basis.
• $WORK File system for large temporary files with high I/O bandwidth demands (scratch file system). No backup of files residing here. Files not used for more than 28 days will be automatically deleted! •$GPFSARCH
Storage for large files that are used infrequently. Long-term storage is done on magnetic tape. Thus, the retrieval of data may incur a significant time-delay. Backup of all files located in this file system is performed on a daily basis.

Details on naming conventions and access right rules for FZJ file systems are given in HPC Data Rules for GPFS and Lustre File Systems.

Limits

Quota Limits

The resources of JuRoPA's Lustre file systems are controlled by the quota policy for each group/project. So, each member of a group shares the same quota limits. Measured resources are: allocated disk space and inodes (number of files). Members of a group/project share the following resource limits:

• $HOME  disk space quota 3.0 TB (soft) 3.2 TB (hard) inode quota 2 000 000 (soft) 2 200 000 (hard) grace period 14 days •$WORK

 disk space quota 3.0 TB (soft) 3.2 TB (hard) inode quota 2 000 000 (soft) 2 200 000 (hard) grace period 14 days

• $GPFSARCH  disk space quota 100 TB (soft) inode quota 2 000 000 (soft) 2 200 000 (hard) grace period 14 days Please note: No hard disk space limits exist for$GPFSARCH, but, if more than 100 TB are to be requested, please contact the supercomputing support at JSC (sc@fz-juelich.de) to discuss optimal data processing, particularly with regard to the end of the project. Furthermore, for some projects there may exist special guidelines.

For all other Lustre file systems that do not belong to the user but may have write access enabled, very restrictive quota limits apply:

 disk space quota 20 GB (soft) 30 GB (hard) inode quota 100 (soft) 200 (hard) grace period 14 days

Please be aware of this constraint when sharing files with other projects and avoid uploads of files using your login account into directories owned by other projects/groups.

In order to display the current quota usage of a project use the commands described below (see: User Commands).

Note: Users can write files into each file system up to the hard limit or exceed the softlimit for the amount of time specified as grace period. Once one limit is exceeded, all applications trying to write data will crash with the following error message:

 : writing ': Disk quota exceeded : closing ': Input/output error

Please note that $WORK is considered to be a scratch file system. For this reason, all files in$WORK that haven't been accessed or modified for more than 28 days will be removed by an automatic clean-up procedure.

Internal Constraints

Lustre internal limitations for the version currently installed (1.8.4) are listed below:

FeatureLimitDescription
Maximum File Size ($HOME)8 TBSetup dependent (number of OST's times 2 TB) Maximum File Size ($WORK)240 TBSetup dependent (number of OST's times 2 TB)
Maximum File/Path name255/4096 characters

Activation of Quota System

As of July 14, 2011 the Lustre quota mechanism on all Lustre file systems of the JUROPA/HPC-FF cluster is active in order to permanently check whether the current allocation for each project/group is in line with the official quota policy (see Quota Limits).

In case the disk quota limits are exceeded, applications trying to perform a write operation will fail with the following error message:

 : writing ': Disk quota exceeded : closing ': Input/output error

Please use the checklist below to ensure that the disk resource allocation of your group or project and the disk allocations related to your userID are compliant with the quota limits:

1. Check the description of the disk quotas and the quota limits for the JUROPA/HPC-FF Lustre file systems (see: Quota Limits below)
2. Check your project/group and personal disk resource allocation (see: Quota Commands below)
3. If the quota limit is exceeded, please clean-up or archive your data until your disk quota falls below the limits. Please use the recommendations below on how to archive files (see: Archiving)

In case you have further questions, please don't hesitate to contact sc@fz-juelich.de.

Features

The additional features listed below have been enabled for all Lustre file systems.

• POSIX ACL
The command line interfaces setfacl and getfacl can be used to set or view ACL's.
• FILE LOCKS
All clients can activate local file locks, i.e. locks are valid only for the client which owns the lock.

User Commands

The command lfs is the user interface for the following areas: find files, file striping, global file system resources for all Lustre file systems. The sections below act as an overview for the most important use cases. More detailed information can be found in the manual page of the lfs command.

Quota Commands

• Please use one of the three commands below to display your own disk quota allocation, the allocation of your group/project or those of the group/project members.

1. Show the content of file usage.quota in project/group directory of $HOME: cat$HOME/../usage.quota
The file contains the storage allocation of the group/project and its members in all Lustre file systems of JUROPA. The usage.quota files are updated at every even hour.
2. Execute the q_lustrequota command to display the usage of the project/group and all its members in Lustre file systems simultanously. Note that the command is a convenient way to display the contents of file $HOME/../usage.quota. 3. Use the Lustre user command interface to print the current project/group allocation: lfs quota -g <group_name>$HOME
lfs quota -g <group_name> $WORK or, to print user allocation: lfs quota -u <login_name>$HOME
lfs quota -u <login_name> $WORK Both commands will show information of the form:  Disk quotas for user dummy (uid 1704): Filesystem kbytes quota limit grace files quota limit grace /lustre/jwork 24580 100000 0 273 3000 0 0 - The first column names the file system the quota command was applied to. Columns 2 to 4 refer to the block alloaction and columns 6 to 9 to the i-node quota. For the block quota the allocation is measured in 1KB chunks. In the example above the user allocated 24580 KB of the allowed disk quota of 100000KB and 273 of 3000 allowed i-nodes (files). The command allows access only to information on the user's primary group and login account. Disks usage of other project/group members can't be displayed. To retrieve this information please use the q_lustrequota command (see previous item). Find Command • The normal Linux find command can produce considerable performance degradation on Lustre file systems. Always use the lfs find command to search in a directory hierarchy of a Lustre file system. For example to retrieve all files changed in the last 24 hours execute: lfs find . --maxdepth 3 -mtime -1 -print Stripe Information • Note: Please avoid changing any striping parameters, if you do not fully understand the concept of striping for Lustre! The performance of I/O operations of the Lustre file system depends on the striping configuration of the files. Striping is determined by the three parameters: stripe_size, stripe_count, stripe_offset. The meaning of each parameter is:  stripe_size block size used for I/O operations stripe_count number of OST's used for parallel I/O stripe_offset ID of the OST to start striping The setting stripe_size=0 configures all I/O operations to be performed in 1MB data blocks. If stripe_count=-1 the file will be striped over all available OST's of the file system. Assigning stripe_offset=-1 will pick the starting OST at random. Settings can be changed on the file or directory level. The stripe parameter settings of a directory will be inherited by all files created inside the directory by default. The default settings for all home directories ($HOME) and scratch file system ($WORK) are:  File system stripe_size stripe_count stripe_offset$HOME 0 1 -1 $WORK 0 4 -1 Statistics of$HOME have shown that nearly all files are of size 1MB or smaller. Any additional striping would slow down the performance of Lustre and waste space.

In order to retrieve the settings for a directory, run:

lfs getstripe test-dir

 OBDS: 0: jhome5-OST0000_UUID ACTIVE 1: jhome5-OST0001_UUID ACTIVE 2: jhome5-OST0002_UUID ACTIVE 3: jhome5-OST0003_UUID ACTIVE test-dir/ stripe_count: 1 stripe_size: 0 stripe_offset: -1

A file created in the directory test-dir will inherit the striping settings of its directory:

lfs getstripe test-dir/file-1

 OBDS: 0: jhome5-OST0000_UUID ACTIVE 1: jhome5-OST0001_UUID ACTIVE 2: jhome5-OST0002_UUID ACTIVE 3: jhome5-OST0003_UUID ACTIVE test-dir/file-1 obdidx objid objid group 3 333492 0x516b4 0

Global File System Resources

• Users can display a list of available Lustre file systems and their space allocation with help of the command:

df -t lustre

Although the file system resources indicate enough space, one or more underlying OSTs might be full. To check the allocation status of the OSTs of $HOME e.g. use: lfs df -h$HOME

 UUID bytes Used Available Use% Mounted on jhome5-MDT0000_UUID 821.9G 598.2M 774.3G 0% /lustre/jhome5[MDT:0] jhome5-OST0000_UUID 7.2T 84.7G 6.7T 1% /lustre/jhome5[OST:0] jhome5-OST0001_UUID 7.2T 90.0G 6.7T 1% /lustre/jhome5[OST:1] jhome5-OST0002_UUID 7.2T 96.7G 6.7T 1% /lustre/jhome5[OST:2] jhome5-OST0003_UUID 7.2T 92.7G 6.7T 1% /lustre/jhome5[OST:3] filesystem summary: 28.7T 364.2G 26.8T 1% /lustre/jhome5

All OSTs should have roughly the same utilisation to achive good I/O performance.

All files in the user's home directory ($HOME) are automatically backed up by TSM (Tivoli Storage Manager) daily. Scratch file systems and local directories ($WORK, $GPFSWORK, /tmp) are not backed up! In order to restore a file in the Lustre home directory, use adsmback [-type=home] & on one of the GPFS nodes (see: Access/Environment). If the option -type is not specified, the user will be prompted for the type of file system:  Which type of filesystem should be restored? Enter: {home|arch|gpfshome} The adsmback command grants access to the backup data of the user's Lustre home directory for JUROPA/HPC-FF and to the GPFS home directory, if the user has a JUGENE account, too. Follow the GUI by selecting:  for Lustre: File level -> /lustre/homeX -> group -> userid -> ... for GPFS: File level -> /homeX -> group -> userid -> ... Select files or directories to restore Press [Restore] button If the data should be restored to the original location, then choose within the Restore Destination window: - for Lustre: Original location - for GPFS: Following location + /gpfs/homeX + Restore complete path Important note: Don't use the native dsmj-command which will not show any home data. Note: Since the backup tool only stores the content of the files, any striping information other than the defaults won't be restored. Archiving In case you have files which you do not need at the moment but would like to keep during the lifetime of your project, we recommend to use the$GPFSARCH file system to store these files. This file system is available on the GPFS nodes of JUROPA/HPC-FF only. In order to login to these nodes please use

ssh <your_account>@juropagpfs

Files stored on this file system will be migrated to tape.

In order to store files on $GPFSARCH we strongly recommend to keep the number of files on that file system as small as possible, because retrieving one file from tape will take at least about 2 minutes regardless of the size of the file. Therefore, you should merge your files in tar archives before moving them to$GPFSARCH. Additionally the tar archives can be compressed using gzip or even better bzip2. Files moved to $GPFSARCH should not be larger than 1 TB. Example: Assuming you have files in$HOME/my_directory, $WORK/my_work1 and$WORK/my_work2 and you would like to store them on $GPFSARCH. 1. Login to one of the JUROPA GPFS nodes: ssh <your_account>@juropagpfs 2. Tar and compress the files for example into two files and store them in$GPFSARCH:

tar -cjvf $GPFSARCH/my_directory.tar.bz2$HOME/my_directory

tar -cjvf $GPFSARCH/my_work_files.tar.bz2$WORK/ my_work1 $WORK/my_work2 This created two files my_directory.tar.bz2 and my_work_files.tar.bz2 in$GPFSARCH, the first one containing all files in $HOME/my_directory, the second one containing all files in$WORK/my_work1 and $WORK/my_work2. The original files can now be deleted. If you want to retrieve the files from the archive, you can do the following 1. Login to one of the JUROPA GPFS nodes: ssh <your_account>@juropagpfs 2. Copy or move the files from the archive to the location where you need them, for example: mv$GPFSARCH/my_work_files.tar.bz2 $WORK (this might take some time if the file was already migrated to tape) 3. Unpack the files: tar -xjvf$GPFSARCH/my_work_files.tar.bz2

You can also extract a limited number of files instead all oft them. Please see the manual pages of the tar command for further details.