Quick start guide
The following instructions are for those who want to get Totalview up and running quickly. In order torun totalview, please use aprun.x instead of aprun until further notice.
- ssh -Y rosa.cscs.ch
- module load PrgEnv-<cray|pgi|gnu|pathscale|intel>
- ftn -g -o myprogram myprogram.f90
- module load xt-totalview
- salloc --ntasks=32 --time=00:15:00
- totalview aprun.x -a -n 32 ./myprogram
Read on for the details.
Latest status : hpcforge.org/plugins/mediawiki/wiki/codes/index.php/Main_Page
TotalView is a debugger with support for Fortran, C, C++, MPI, OpenMP and threads.
TotalView is an interactive tool that lets you debug serial, multiprocessor and multithreaded programs. It can be executed either as a graphical user interface (by using the totalview executable) or from a command-line interface (by using the totalviewcli executable). Totalview provides source-level debugging of Fortran and Fortran 90, C, and C++ codes. It can be used to debug parallel programs based on MPI. It also has facilities for multi-process thread-based parallel programs such as OpenMP and GPUs.
- Totalview is currently installed on all the CRAY systems at CSCS. In order to check which version is available, type the following command:
module avail xt-totalview
- There are no interactive nodes reserved for debugging. You will need to launch TotalView from within an interactive session.
- Note that the interactive sessions are strictly meant for debugging, running anything else on these compute nodes is prohibited.
- CSCS current license allows for 2 concurrent TotalView users on Rosa or Todi. Two of those users can use up to 128 cores. For GPU debugging, have a look at DDT.
- If larger debugging runs are needed, please contact CSCS Help.
Cray XE6 supports a special implementation of the Etnus Totalview debugger. The Totalview debugging suite for the Cray systems differs in functionality from the standard Totalview implementation in the following ways:
- Debugging multiple threads on compute nodes is not supported.
- Debugging MPI_Spawn(), OpenMP, Cray SHMEM, or PVM programs is not supported.
- Compiled EVAL points and expressions are not supported.
- Type transformations for the PGI C++ compiler standard template library collection classes are not supported.
- Exception handling for the PGI C++ compiler runtime library is not supported.
- Spawning a process onto the compute processors is not supported.
- Machine partitioning schemes, gang scheduling, or batch systems are not supported.
- The Totalview Visualizer is not included. The Totalview HyperHelp browser is not included.
For details please have a look at the documentation.
- For source level debugging, the program must be compiled with the -g flag:
ftn -g -o myprogram myprogram.f90
- You need to turn on X-forwarding in order to be able to use the GUI of Totalview, which is done by adding the -Y or -X flag when you login via ssh (run "man ssh" for an explanation of the flags) :
ssh -Y rosa.cscs.ch
Example1: Invoking Totalview to debug a MPI application
- First, load the totalview module to get the correct environment variables set:
module load xt-totalview
- If you want to use TotalView to start your job and monitor it as it runs, you must take some additional steps. Because parallel jobs cannot be launched directly from the login nodes, you will need to launch TotalView from within a batch job. The easiest way to do this is to start an interactive batch job as follows:
salloc --ntasks=32 --time=00:15:00
- Once your job starts, you can launch TotalView graphical user interface from the command line. The following example starts TotalView on 32 compute cores:
totalview aprun.x -a -n 32 ./myprogram
- The normal aprun options are given after the TotalView flag -a. Starting Totalview will make two windows appear on the screen. These are called the Totalview Root and Process Window (or the main window).
- To start the execution click the Go button on the main window.
- This results in TotalView asking if the user wants to stop the program, e.g., for inserting break points. Click YES and the source code for the main program is displayed on the main window.
- Insert a breakpoint. Source lines where it is possible to insert a breakpoint are marked with a box in the left column. Click on a box to toggle a breakpoint.
- The execution can then be started with the Go button. The button "Go" runs the program from the beginning until the first breakpoint. "Next" and "Step" takes you one line forward. "Out" will continue until the end of the current subroutine/function. "Run to" will continue until the next breakpoint.
- The value of variables can be inspected by right clicking on the name, then choose "add to expression list". The variable will now be shown in a pop up window. Scalar variables will be shown with their value, arrays with their dimensions and type. To see all values in the array, right click on the variable in the pop up window and choose "dive". You can now scroll through the list of values.
- Another option (very useful) is to visualize the array : after choosing "dive", open the menu item "Tools->Visualize" of the pop up window. If you did this with a 2D array, use middle button and drag mouse to rotate the surface that popped up, shift+middle button to pan, Ctrl+middle button to zoom in/out.
For more information, see Troubleshooting below.
Debugging a MPMD application
Some specific users may have to debug applications running with more than 1 executable. For instance, on the CRAYs, aprun can be used to launch applications in Multiple Program, Multiple Data (MPMD) mode. The command format is:
aprun -n pes exe1 [args_for_exe1] : -n pes exe2 [args_for_exe2]
It is relevant to notice that compute nodes on the CRAY are not shared between executables. An application using 3 processes for exe1 + 3 processes for exe2 will require 2 compute nodes, not 1. To debug both executables in MPMD mode, Totalview can be started with :
totalview aprun.x -a -n pes exe1 [args_for_exe1] : -n pes exe2 [args_for_exe2]
- This page describes how to launch TotalView on the CRAYs. For further information please check the TotalView documentation.
- In some cases, Totalview functionality is limited because Compute Node Linux (CNL) does not support some features in the user program. Please have a look at the Cray XT Programming Environment User's Guide (Chapter 11)
- Once inside the debugger, if you cannot see any source code, and if you keep the source files in a separate directory, add the search path to the src directory via the main menu item File->Search path.
- Check your ~/.totalview/preferences6.tvd preferences file.
- If everything fails, contact the helpdesk
If you get the following error : Encountered unsupported location operator 0xf3, (*** Invalid Op (0xf3)
This can happen when the gnu compiler is used for compilation (all other compilers should be fine). Debugging of the full AVX instruction set is not fully supported yet but TotalView can debug a decent subset of them. Running to breakpoints will work but in very rare cases, step or next may skip some instructions. This will be resolved in Totalview's upcoming release.
If you get the following error :
INFO: Copying library "/dsl/var/spool/alps/458302" into the local
file cache ...
ERROR: load_library_into_cache: Error closing cache file; errno=Disk
please clean your ~/.totalview/lib_cache directory.
- The use of Totalview's GUI can sometimes be slow when you login via ssh. As a workaround, the TotalView Remote Display Client will let you launch TotalView on a remote system. Starting the Remote Display Client on your system will display a window into which you will be able to enter information about how the Remote Display can go from your system to the system upon which TotalView will execute. As Remote Display invokes TotalView on the remote host, it does not need be installed on your local machine. The Client can run on Linux x86, Linux x86-64, Darwin and Windows systems. No license is needed to run the Client.
Example : Invoking remote display from a Linux workstation
- First, choose and download the client corresponding to your operating system :
uname -a # ( for instance : x86_64 GNU/Linux )
scp -C firstname.lastname@example.org:/opt/toolworks/default/remote_display/* .
( for instance : remote_display.1.3.1-0-linux-x86-64.tar )
- Install the client on your workstation and set the correct environment variables :
tar xf remote_display.1.3.1-0-linux-x86-64.tar
- Start the client :
The modulefile, xt-totalview-mem-debug, sets up the compilation so that the proper libraries will be linked for TotalView memory debugging. Load this modulefile only when memory debugging is desired.
module load xt-totalview
module list -t
module load xt-totalview-mem-debug will add "-Wl,-zmuldefs" to your compilation line. Start totalview and find your memory bug.