This is a shared - parallel filesystem based on the IBM GPFS software, available on all CSCS HPC platforms. Each system can access /project using the native GPFS / NFS client through Infiniband or ethernet.
This filesystem provides intermediate storage space for frequently reused datasets, shared code or configuration scripts, and datasets that need to be accessed from different HPC and data analysis platforms. It is also intended for large files: performance could be increased when using large files, therefore it may be useful to consider taring small files.
Access to /project is granted to all users with a production or large development project. Data and inode quotas are group based. The quota space allocated to each group is 10 TB and 500 000 inodes (both can be modified upon request).
*** All data will be deleted 6 months after the end of the project without further warning ***
Users are NOT supposed to run jobs from this filesystem because of the low performance.
Note that data on /project filesystem is NOT backed up.
You are responsible for making your own copies of essential data.
The /project file system is too large to carry out regular backups and therefore a copy of any essential data should also be placed elsewhere.
The underlying filesystem and disk infrastructure are designed for redundancy and resilience, but data loss can never be ruled out.
For /project filesystem, it is vital that you make a copy of any important data.
Six Transtec servers are serving the gpfs filesystem. Each server has 2 sockets dual core with 8 GB of memory.
The servers are connected to two IBM DS4800 SAN storage through Fiber-channel 4 Gb/s. A total number of 384 disks are connected to both controllers, each disk has 1TB of capacity.
Please find below a quick how-to guide for the most common procedures on the /project filesystem. You might also have a look at the list of FAQ on Data Management for more information.
How to check your quota on /project
To check your disk space usage type quota, which points to this command:
If this command is not available on the platform your are working on, please email to email@example.com.
How to write on /project if you belong to multiple projects
If you belong to more than one project you always are logged in as your primary project. If wish to write on /project of one of your secondary IDs (sXYZ) you need to change project ID using the following command
Then you will be able to write on /project/sXYZ and to check the quota of the secondary project with the usual commands sbucheck and monthly_usage. For further details on the command newgrp, please have a look at man newgrp.
How to transfer efficiently your data to /project
If you need to transfer a large amount of data from your local platform to the /project folder at CSCS or viceversa, you have an alternative that might run faster then sftp or scp. The program mpscp can send data-stripes in multiple non-encrypted TCP-streams and enable greater transfer rates. To start the data transfer on /project use the following command:
$ mpscp -m /apps/ela/mpscp/bin/mpscp -a -w 1 yourfile firstname.lastname@example.org:/project/yourgroup
The flag -w controls the number of TCP-streams used in parallel: in the example above, only 1 stream is used, but you can increase it of course; you can also transfer directories recursively with the flag -r. Please be aware that you will need to install the mpscp software on your local machine, if it is not yet available. For further information and for downloading the source, please have a look here.
Snapshots provide an online backup that allows easy recovery from common problems such as accidental deletion of a file, or need to compare older versions of the same file.Snapshots of the entire GPFS file systems /project are taken every night at 05:00 AM.
Snapshots of a file system are read-only; changes can only be made to the active (that is, normal, non-snapshot) files and directories. Only last three days snapshots are available. Older snapshots are deleted automatically.
How to restore a file from GPFS snapshots
GPFS snapshot creates a .snapshots subdirectory to ALL directories in the GPFS file system.
The .snapshots directories are invisible in the sense that the ls command or readdir() function does not return .snapshots. This is to prevent recursive file system utilities such as find or tar from entering into the snapshot tree for each directory they process.
For example, ls -a /project/<username> does not show .snapshots, but ls
/project/<username>/.snapshots and cd /project/<username>/.snapshots do.
# ls /project/<username>/.snapshots
. .. snap-20100808 snap-20100809 snap-20100810
To restore a file or a directory from GPFS snapshots is EASY.
Type cd /project/<username>/.snapshots and choose the most convenient snapshot among the three available and copy the old file to your original directory.
fileA in the directory /project/<username>/dirA/ has been deleted or modified and you want to retrieve one of the older version.
# cd /project/<username>/dirA/.snapshots
. .. snap-20100808 snap-20100809 snap-20100810
# cd snap-20100809
# cp fileA /project/<username>/dirA/