Todi Cray XK7
This page describes Todi in the following sections
- A short description of the machine
- How to access Todi
- Programming environment and supported software
- Submission of batch jobs
- Verifying GPU usage
- Data storage
A Short Description of the Machine
The Cray XK7 (Todi) is the first GPU/CPU hybrid supercomputing system with high scalability at CSCS, designed to run data parallel and computationally intensive applications.
It features 272 nodes, each one equipped with 16-core AMD Opteron CPU, 32 GB DDR3 memory and one NVIDIA Tesla K20X GPU with 6 GB of GDDR5 memory for a total of 4352 cores and 272 GPUs.
How to Access Todi
The system will be dedicated to development projects, involving the tuning of hybrid GPU/CPU and multicore applications.
Users who wish to have access to Todi need to submit a Development Project or to contact Maria Grazia Giuffreda.
Todi is accessible via SSH from ela.cscs.ch as todi.cscs.ch.
In order to access all resources you need to use the batch queuing system.
Programming Environment and Supported Software
The software environment on Todi is controlled using the modules framework which is an easy and flexible way to access all the available compilers, tools, and applications.
Run the command module avail to see the modules available on the system, and module list to see the modules currently loaded.
Compilers for Todi are:
- Cray
- GNU
- PGI
- Pathscale
Available for GPU programming:
- CUDA
- OpenCL
- Accelerator directives (Cray, PGI, CAPS)
Submission of Batch Jobs
Todi uses the SLURM batch system: the following usage policy applies on Todi:
Prime Time 8AM - 6PM: no production runs are allowed. Only jobs with a wall clock time of 1 hr can be submitted for development purposes.
Non Prime Time 6PM - 8AM + Weekends: free for production runs.
To list the available queues and partitions where to submit jobs use the commands sinfo and scontrol show partition. The "night" queue can use up to 12 hours, the "day" queue can use for up to 1 hour.
Please refer to the man pages and the official SLURM documentation for the details.
To run a job interactively, first allocate the nodes you need for the time required with the command:
salloc -N2 --time=00:05:00
Then run your executable on the allocated nodes with aprun (the following uses one process per node on two nodes): aprun -n2 -N1 ./test.x
Details of batch submission and how to set up a batch job are available on the following page:
For a list of the most useful SLURM commands, please have a look at the corresponding FAQ section under the User Forum.
Verifying GPU usage
There are different ways to verify whether the GPUs are being used within your program.
Verification at compilation time using directives:
When using directives (as opposed to CUDA), it can be challenging to trust that the executable will run on both the host and the accelerator. Compilers provide detailed messages to check whether the code was targeted to run on the accelerator.
- PGI directives: Use "-Minfo=accel" to dump out compiler information related to the accelerator target. If the output does not say: "Accelerator kernel generated", then it won't be running on the accelerator. For instance: pgfortran -o test.exe test.f90 -ta=nvidia -Minfo=accel -fast
- Cray directives: For Fortran use "-rm" to generate a loopmark listing file: ftn -rm -c test.f90. For C/C++ use "-h list=m": cc -h list=m -c test.c (this creates a loopmark listing file (.lst) for the source file).
Verification during runtime:
The next step is to verify that the GPUs are being used during runtime. For this, we can use two methods:
- Runtime debugging output (enabled via the compilers' environment variable)
- Nvidia's CUDA computeprof profiler (enabled via the environment variable COMPUTE_PROFILE): since Todi uses Nvidia GPUs, we can use Nvidia's computeprof as an independent verification of GPU usage.
PGI's debugging output for directives:
Set the following environment variable before runtime: export ACC_NOTIFY=1
During runtime you will see in output the information about kernel launches.
Cray's debugging output for directives:
Run the code with this environment variable set: export CRAY_ACC_DEBUG=2
In output information for each process is reported on device initialization, contexts created, data transferred, launched kernels, etc...
Nvidia's computeprof:
This method can be used for any of the following programming languages: CUDA, OpenCL, Cray's directives, PGI Accelerator, HMPP. Set the following environment variable: export COMPUTE_PROFILE=1
After running the executable, you will get a file cuda_profile_0.log with profile information on the CUDA driver (transferring data, launching kernels, etc...).
See the CUDA computeprof documentation for further info (check the command line profiling section): http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Compute_Visual_Profiler_User_Guide.pdf
Data Storage
/scratch
Todi has a scratch partition (/scratch/todi/user_name): please be aware that /scratch is not backed up and is cleaned once a week. Do not use /scratch for long-term storage, but only for submitting batch jobs.
For further information, please have a look at Data Management or contact help(at)cscs.ch.


