Profiling GPU Applications on Eiger
There are a few different profilers that are useful for understanding a GPU application's performance:
- Nvidia Visual Profiler
- Nvidia Parallel Nsight (Windows only)
- ATI Stream Profiler
This page focuses on Nvidia's Linux-based GPU profiling tools such as cudaprof and openclprof. Additionally, it is currently unclear how to profile multi-GPU applications, so this page focuses solely on scalar performance profiling.
Using the Nvidia Visual Profiler on Eiger
Configuring and Running Cudaprof
Log in to Eiger using X11 forwarding:
ela3 ~$ ssh -X eiger
Launch an interactive job using SLURM's srun for getting an interactive bash prompt:
eiger160 ~$ srun -N 1 -n 1 --gres=gpu:1 --constraint=fermi --time=00:30:00 --pty /bin/bash
Load the CUDA module and get access to the Nvidia-included QT libraries:
eiger180 ~$ module load cuda
Try out the profiler on the SHOC benchmarks:
eiger180 ~$ /apps/eiger/Cuda-4.0/cuda/computeprof/bin/computeprof &
Within the CUDA Visual Profiler:
- Click on File->New…
- Fill out the Session Settings (as shown) and click Start.
- Wait… and then look at the Profiler Output window and click on the Summary Table and GPU Time Summary Plot
Interpreting the results
- Profiler counters correspond to events at the thread warp level (and not at the single thread level).
- When profiling, the grid configuration should be chosen such that all the multiprocessors are uniformly loaded (i.e. the number of blocks launched on each multiprocessor is same and also the amount of work of interest per block is the same). This will result in better accuracy of extrapolated counts (such as memory and instruction throughput) and will also provide more consistent results from run to run.
An in-depth description of this profiler (including info on the specific counters) can be found here: