The PGI compiler suite includes Fortran 77, Fortran 90/95, C and C++ compilers. It is accessed by loading the PGI programming environment: module load PrgEnv-pgi
The default version of the compiler is loaded automatically when you load the programming enviornment. Older and/or newer versions of the compiler may be available: to see which versions are available issue module avail pgi. To use a different version of the PGI compiler issue module switch pgi pgi/<new_version>.
To compile a Fortran 90 MPI code on the system invoke the Cray compiler driver:
> ftn [compiler options] example.f90 -o example.x
Likewise for C and C++ codes:
> cc [compiler options] example.c -o example.x
> CC [compiler options] example.C -o example.x
The man pages (man pgf95; man pgcc) provide information on all the compiler options available. Note that if two compiler options conflict, the last option on the command line takes precedence!
We recommend in the first instance to use the following optimization flag:
- -fastsse (-fast)
The -fastsse flag is equivalent to -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre, where:
- -O2 specifies general optimization level 2
- -Mnoframe prevents the generation of code to set up a stack frame
- -Munroll=c:n completely unrolls loops with loop count of n or less
- -Mlre indicates loop-carried redundancy elimination
- -Mautoinline enable automatic function inlining in C/C++
- -Mvect=sse generates SSE and SSE2 instructions for the Opteron
- -Mscalarsse generates scalar SSE code with xmm registers
- -Mcache_align aligns long objects on cache-line boundaries
- -Mflushz flushes SSE denormal numbers to zero
- -Mpre enables partial redundancy elimination
More aggressive optimization can be obtained by adding -O3, ie:
- -O3 -fastsse
At the -O3 level, all level 2 optimizations are performed, and in addition, more aggressive code hoisting and scalar replacement optimizations are performed. These optimizations may speed up your code but might also slow it down, so it is always recommended to benchmark the performance of your code with a variety of options enabled/disabled. It may be worth experimenting with the -Munroll, -Minline, -Mmovnt and -Mconcur options in particular. Use -help to list the compiler options available or to see details on how to use a given option, e.g. pgf95 -Munroll -help.
As mentioned above, the -fastsse flag is used to enable SSE vectorization and is a key in getting good performance from the AMD Opteron processor. Information regarding the optimizations achieved by the compiler can be written to standard error with the -Minfo flag. The following are potential barriers to SSE vectorization:
- Apparent dependencies and C pointers: give the compiler information on what can be vectorized by using the -Msafeptr flag, via pragmas, or by employing the restrict type qualifier
- Function calls: try to inline the functions by using the -Minline or -Mipa=inline flags
- Type conversions: manually convert constants or use compiler flags
- Large number of statements: try the -Mvect=nosizelimit flag
- Too few iterations: unroll the loops
- Genuine dependencies: try to restructure the loop manually
If you can be flexible with precision you should try "-Mfprelaxed".
Advanced Vector Instructions (AVX)
If you have the module "xtpe-interlagos" loaded when compiling code the compiler drivers will add the flag "-tp bulldozer-64" to your compile line. This flag adds support for the AMD bulldozer architecture, specifically for AMD's variety of Advanced Vector Instructions (AVX), including the extended FMA4 instruction set. The "xtpe-interlagos" module is loaded by default when you load "PrgEnv-pgi".
In addition to -fastsse, the -Mipa option for interprocedural analysis and optimization (IPA) can in some cases improve performance by 5-10%. We suggest using the following IPA options:
Note that the interprocedural analysis flag must be used at both compile and link time.
For the PGI compiler use the -mp=nonuma option to enable OpenMP support.
The following compiler flags may be useful for helping debug your code:
Generate symbolic debugging information (useful at -O0)
Generate symbolic debugging information in the presence of optimization
Adds array bounds checking
Give verbose output
Generate a listing file
Provide information on the optimizations performed by the compiler
The PGI Cluster Development Kit (CDK) options -Mprof=mpi, -Mmpi, and -Mscalapack are not supported on the system.
See the man pages for detailed information on the compilers and compiler flags (man pgcc, man pgf95)
Refer to the online documentation from the Portland Group.