The PathScale compiler suite is a high performance compiler suite with a proprietary back-end based on the SGI compilers and a GNU front-end. The suite includes Fortran 77/90/95 (with Cray/SGI Fortran 95 extensions and character pointers), C and C++ compilers. The PathScale compiler suite is accessed by loading the PrgEnv-pathscale module file: module load PrgEnv-pathscale. Please note that the PathScale compiler is no longer fully supported on the Cray XE6. Supporting libraries are not provided and we cannot guarantee the availability of future releases of the compiler. We strongly suggest that you use a different compiler unless you have a very specific requirement for PathScale.
The current default version of the compiler (pathscale) is loaded automatically when you load the programming environment. Older and/or newer versions of the compiler may be available: to see which versions are available issue module avail pathscale. To use a different version of the PathScale compiler issue module switch pathscale pathscale/<new_version>.
To compile a Fortran 90 MPI code on the system invoke the Cray compiler wrapper:
> ftn [compiler options] example.f90 -o example.x
Likewise for C and C++ codes:
> cc [compiler options] example.c -o example.x
> CC [compiler options] example.C -o example.x
The PathScale compiler organizes options into twelve groups by compiler phase or class of feature. The general syntax is
The group names are as follows:
Options relating to the writing of a listing file
Global scalar optimization
Loop nest optimization
Using the appropriate compiler optimization flags is essential for reasonable performance of your application on the system. The default general optimization level is -O2, which corresponds to the inclusion of conservative optimizations, ie. ones that are virtually always beneficial. We recommend in the first instance to add the following optimization flag for increased performance:
This adds a subset of -OPT suboptions, equivalent to -OPT:ro=2:Olimit=0:dvi_split=ON:alias=typed.
More aggressive optimization can be obtained with:
which is equivalent to -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math.
- -O3 turns on aggressive optimizations that may or may not improve code performance
- -ipa turns on interprocedural analysis (see bleow)
- -fno-math-errno prevents errno being set when calling math functions that are executed in a single instruction, e.g. sqrt.
- -ffast-math improves floating point performance by relaxing ANSI and IEEE rules
Note that optimizations introduced with the -Ofast flag may affect floating point accuracy due to the rearrangment of computations (see "Floating point accuracy" below).
Floating point accuracy
Relaxing the precision requirements for floating-point numbers might allow for faster code. There are three relevant options here: -OPT:roundoff, OPT:IEEE_arithmetic, and -OPT:IEEE_Nan_Inf.
The -OPT:roundoff option defines the extent to which the compiler can introduce roundoff error:
- -OPT:roundoff=0 no roundoff error permitted (default at -O0, -01, and -02)
- -OPT:roundoff=1 permits limited roundoff error (default at -O3)
- -OPT:roundoff=2 permits roundoff error due to reassociating expressions (default at -Ofast)
- -OPT:roundoff=3 permits any roundoff error
The -OPT:IEEE_arithmetic option specifies the level of conformance to the IEEE 754 floating-point roundoff and exception handling behaviour:
- -OPT:IEEE_arithmetic=1 defines strict conformance to IEEE standard (default at -O0, -O1, and -O2)
- -OPT:IEEE_arithmetic=2 allows some relaxation of accuracy
- -OPT:IEEE_arithmetic=3 allows any mathematically equivalent tranformations to be appiled
The -OPT:IEEE_NaN_Inf=(on|off) controls the conformance to IEEE standards for NaN and Infinity. Default is on.
Note: the GNU-style flag -ffast-math (which is implied by -Ofast) is equivalent to -OPT:IEEE_arithmetic=2 -fno-math-errno. If you wish to adhere strictly to IEEE arithmentic then you should use the -fno-fast-math flag, which implies -OPT:IEEE_arithmetic=1 -fmath-errno.
Loop nest optimization
The loop nest optimizer (LNO) performs loop transformations to optimize nested loops by making better use of cache. The -LNO options are only enabled at general optimization level -O3 or higher.
Interprocedural analysis and optimization can be turned on (at any optimization level) with the -ipa option, and is turned on by default with the -Ofast option. Note that the interprocedural analysis is "whole program" and must be enabled for all source files. When -ipa is used, the majority of compiler optimization is done at the link stage rather than at the compile stage, so compilation will be fast but linking may take significantly longer. The interprocedural analysis flag must be turned on at both compile and link time.
It is possible to allow the compiler to make assumptions about aliasing, which could improve the performance of your code.
- -OPT:alias=typed (this is implied by -OPT:Ofast)
Automatic optimization tuning
The pathopt2 tool can be used to help tune the PathScale compiler for higher performance for a specific code. This tool iteratively tests different compiler options and combinations of options, tracks the resutls, selects the best options within a subgroup, and then elevates those options within the test hierarchy.
For the PathScale compiler use the -mp option to enable OpenMP support. There is no support for OpenMP directives in C++ that use exceptions, classes or templates.
GCC object compatibility
PathScale is fully compatible with GCC which means that you can mix and match the linking of GNU and PathScale compiled binaries and libraries. The front-end is source compatible with the GNU compiler suite for C/C++.
The following compiler flags may be useful for helping debug your code:
Generate debugging information (changes optimization to -O0 unless explicitly overridden)
Enable array bounds checking for Fortran 90 codes. If you then set the environment variable F90_BOUNDS_CHECK_ABORT=YES the code will crash on an out-of-bounds access
Initializes variables with NaN. If the program uses the variable it will crash rather than producing incorrect results
Initializes variables to 0
If your program gets the right answers with this flag and wrong without, you are likely breaking Fortran aliasing rules
Prevent segmentation fault when a constant parameter in Fortran is written to
Refer to the online documentation from PathScale.