Performance > Craypat
Quick start guide
The following instructions are for those who want to get CrayPat up and running quickly.
- module load perftools
- ftn -c fcode.f90 ; ftn -o fexe fcode.o
- pat_build -O apa ./myprogram
- /usr/bin/time -p aprun -n 128 ./myprogram+pat
- pat_report ./myprogram+pat*.xf
- module load apprentice2
- app2 ./myprogram+pat*.ap2
- pat_build -O <apafile>.apa
- /usr/bin/time -p aprun -n 128 ./myprogram+apa
- pat_report ./myprogram+apa*.xf
Read on for complete details.
CrayPat is a performance analysis tool developed by Cray for CSCS production systems.
The CrayPat tool provides detailed information about application performance. It can be used for basic profiling, MPI tracing and hardware performance counter based analysis. CrayPat provides access to a wide variety of performance experiments that measure how an executable program consumes resources while it is running, as well as several different user interfaces that provide access to the experiment and reporting functions.
CrayPat consists of three major components:
- pat_build - used to instrument the program to be analyzed.
- pat_report - a standalone text report generator that can be use to further explore the data generated by instrumented program execution.
- Apprentice2 - a graphical analysis tool that can be used, in addition to pat_report to further explore and visualize the data generated by instrumented program execution.
CrayPat enables you to sample, trace, measure and evaluate your program's behaviour during execution, and may help you find opportunities to significantly improve program performance.
- First, load the perftools module to get the correct environment variables set.
module load perftools
- Compile and link your application as usual.
ftn -c fcode.f90 ; ftn -o fexe fcode.o
cc -c ccode.c ; cc -o cexe ccode.o
- Instrument your code with pat_build. Use the -u flag to trace all user-defined functions in your program. Use the -g [group] flag to instrument all functions belonging to a specified group. For instance, in order to instrument mpi, io, heap and user functions calls, type :
pat_build -O apa ./myprogram
- The name of the instrumented version of the executable will end with +pat, so in the example, the result will be ./myprogram+pat.
- Run the instrumented executable (by modifying your batch script). The data file must be written into a filesystem, such as Lustre, that supports record locking. Set the environment variable PAT_RT_EXPFILE_DIR to an existing directory in such a file system or run directly from $SCRATCH :
/usr/bin/time -p aprun -n 128 ./myprogram+pat
- Upon successful execution, the report file will be generated and will end with .xf, so in the example, the result will be ./myprogram+pat+*.xf. By default, when you use pat_report to generate a report from one or more .xf files, pat_report also generates a corresponding .ap2 file with the same base name as the original executable. Data in .ap2 format can be viewed in text form using pat_report or viewed and manipulated using GUI tools in Cray Apprentice2. The most significant difference between .xf and .ap2 format is that .xf files require the original instrumented executable to be available to provide mapping from addresses to function names and source line numbers, while .ap2 files incorporate this data mapping and are self-contained. Therefore the .ap2 format is recommended if you wish to preserve the data for future reference. Use pat_report to generate a human readable performance report.
- Apprentice2 can also be used to analyze the results. Read on for more details about apprentice2.
- Using the -O apa flag will create a .apa file. That .apa file will allow you to instrument your application for further analysis.
pat_build -O apa ./<apafile>.apa
/usr/bin/time -p aprun -n 128 ./myprogram+apa
Cray Bioinformatics library routines
Basic Linear Algebra communication subprograms
Basic Linear Algebra subprograms
Co-Array Fortran (Cray X2 systems only)
Fast Fourier Transform library (64-bit only)
manages extremely large and complex data collections
includes stdio and sysio groups
Linear Algebra Package
Lustre File System
network common data form (manages array-oriented scientific data)
OpenMP API (not supported on Catamount)
OpenMP runtime library (not supported on Catamount)
Lightweight message passing API
POSIX threads (not supported on Catamount)
all library functions that accept or return the FILE* construct
I/O system calls
Unified Parallel C (Cray X2 systems only)
For more information, see Troubleshooting below.
- Cray Apprentice2 is a post-processing performance data visualization tool. It will allow you to pinpoint problems in load balance, MPI overhead, I/O strategy and so on.
After you instrument a program for a performance analysis experiment, execute the instrumented program, and generate one or more performance analysis data files, use Cray Apprentice2 to explore the experiment data and generate a variety of interactive graphical reports. Cray Apprentice2 is a GUI tool that requires that your workstation support the X Window System. Depending on your system configuration, you may need to use the ssh -X option to enable X Window System support in your shell session.
- Load the apprentice2 module to get the correct environment variables set.
module load apprentice2
- Launch the Cray Apprentice2 graphical interface in order to visualize data.
Apprentice2 can produce a number of very informative plots of performance data. For more information, see Troubleshooting below.
Cray Apprentice2 on your pc
app2 is a 64-bit executable, hence you cannot run it on your desktop pc (unless it is a 64-bit cpu). It is possible to get a 32-bit linux desktop copy of Cray Apprentice2 for use on your local machine, rather than on CSCS machines. It is available in /apps/rosa/apprentice2-desktop. This desktop version is available on a convenience basis only.
Hardware performance counters
The basic process of setting hardware performance counters follows these steps.
- Load the perftools module and compile your code.
module load perftools
ftn -c myprogram.f
ftn -o myprogram myprogram.o
- Set the runtime environment variable PAT_RT_HWPC to monitor hardware counter group 1.
- The available counter groups are (pat_help counters) :
Summary with instruction metrics
Summary with TLB metrics
L1 and L2 metrics
Hypertransport information (not supported on Quad-core AMD Opteron processors !)
Floating point mix
Cycles stalled, resources idle
Cycles stalled, resources full
Instructions and branches
Floating point operations mix (2)
Floating point operations mix (vectorization)
Floating point operations mix (SP)
Floating point operations mix (DP)
L3 (core-level reads)
L3 (core-level misses)
L3 (core-level fills caused by L2 evictions)
- Instrument and run the program (if you use a batch script to submit your job, then remember to set the runtime environment variable PAT_RT_HWPC=1 to monitor hardware counter within your job script):
pat_build -g mpi,io ./myprogram
aprun -n 128 ./myprogram+pat
By default, pat_build monitors the following hardware counter and derived events (group 1) : PAPI_L1_DCM, PAPI_L1_DCA, PAPI_TLB_DM, PAPI_FP_OPS and CYCLES_USER.
- Of 107 possible papi events, 40 are available, of which 8 are derived.
papi_avail command reports information about the current PAPI events supported on the system.
Upon execution, pat_build generates a report showing hardware counter data including a number of derived metrics and calculated values. These data files can be viewed and examined with pat_report or apprentice2.
- This page describes the basics of xt-craypat and apprentice2. For further information please check :
- For further information, read the following man pages : intro_craypat, craypat, pat_build, pat_report, pat_hwpc, hwpc, app2 and run the command pat_help.
- If everything fails, please contact the helpdesk.