EN Bereichsnavigation EN

Filesystem /project

This is a shared - parallel filesystem based on the IBM GPFS software, available on all CSCS HPC platforms. Each system can access /project using the native GPFS / NFS client through Infiniband or ethernet. 

This filesystem provides intermediate storage space for frequently reused datasets, shared code or configuration scripts, and datasets that need to be accessed from different HPC and data analysis platforms. It is also intended for large files: performance could be increased when using large files, therefore it may be useful to consider taring small files.

Access to /project is granted to all users with a production or large development project. Data and inode quotas are group based. The quota space allocated to each group is 10 TB and 500 000 inodes (both can be modified upon request). 

*** All data will be deleted 6 months after the end of the project  without further warning ***

Users are NOT supposed to run jobs from this filesystem because of the low performance.

/project Policy

**
Note that data on /project filesystem is NOT backed up.
You are responsible for making your own copies of essential data.
** 

No Backup

The /project file system is too large to carry out regular backups and therefore a copy of any essential data should also be placed elsewhere.
The underlying filesystem and disk infrastructure are designed for redundancy and resilience, but data loss can never be ruled out.

**
For /project filesystem, it is vital that you make a copy of any important data.
**

Hardware description

Six Transtec servers are serving the gpfs filesystem. Each server has 2 sockets dual core with 8 GB of memory.
The servers are connected to two IBM DS4800 SAN storage through Fiber-channel 4 Gb/s. A total number of 384 disks are connected to both controllers, each disk has 1TB of capacity.

/project How-To

Please find below a quick how-to guide for the most common procedures on the /project filesystem. You might also have a look at the list of FAQ on Data Management for more information.

How to check your quota on /project 

To check your disk space usage type quota, which points to this command:

/usr/local/bin/quota.ksh

If this command is not available on the platform your are working on, please email to help@cscs.ch. 

How to write on /project if you belong to multiple projects

If you belong to more than one project you always are logged in as your primary project. If wish to write on /project of one of your secondary IDs (sXYZ) you need to change project ID using the following command

newgrp sXYZ

Then you will be able to write on /project/sXYZ and to check the quota of the secondary project with the usual commands sbucheck and monthly_usage. For further details on the command newgrp, please have a look at man newgrp

How to transfer efficiently your data to /project 

If you need to transfer a large amount of data from your local platform to the  /project folder at CSCS or viceversa, you have an alternative that might run faster then sftp or scp. The program mpscp can send data-stripes in multiple non-encrypted TCP-streams and enable greater transfer rates. To start the data transfer on /project use the following command:

$ mpscp -m /apps/ela/mpscp/bin/mpscp -a -w 1 yourfile user@ela.cscs.ch:/project/yourgroup

The flag -w controls the number of TCP-streams used in parallel: in the example above, only 1 stream is used, but you can increase it of course; you can also transfer directories recursively with the flag -r. Please be aware that you will need to install the mpscp software on your local machine, if it is not yet available. For further information and for downloading the source, please have a look here.

GPFS Snapshots

 

Snapshots provide an online backup that allows easy recovery from common problems such as accidental deletion of a file, or need to compare older versions of the same file.Snapshots of the entire GPFS file systems /project are taken every night at 05:00 AM.

Snapshots of a file system are read-only; changes can only be made to the active (that is, normal, non-snapshot) files and directories. Only last three days snapshots are available. Older snapshots are deleted automatically.

How to restore a file from GPFS snapshots

GPFS snapshot creates a .snapshots subdirectory to ALL directories in the GPFS file system.

The .snapshots directories are invisible in the sense that the ls command or readdir() function does not return .snapshots. This is to prevent recursive file  system utilities such as find or tar from entering into the snapshot tree for each directory they process. 

For example, ls -a /project/<username> does not show .snapshots, but ls 
/project/<username>/.snapshots and cd /project/<username>/.snapshots do.

# ls /project/<username>/.snapshots
 . .. snap-20100808 snap-20100809 snap-20100810


To restore a file or a directory from GPFS snapshots is EASY.

Type cd /project/<username>/.snapshots and choose the most convenient snapshot among the three available and copy the old file to your original directory. 

Example:
fileA in the directory /project/<username>/dirA/ has been deleted or modified and you want to retrieve one of the older version.

# cd /project/<username>/dirA/.snapshots
# ls
 . .. snap-20100808 snap-20100809 snap-20100810
# cd snap-20100809
# ls
 fileA
# cp fileA /project/<username>/dirA/