AMD HD RADEON 6990 Multi-GPU Nodes NEW
Introduction
The Visualization, Research & Development Cluster EIGER offers 2 visualization nodes, eiger[205-206], providing each one a dual-gpu AMD HD RADEON 6990 model card, and the necessary software development kit to take advantage of the considerable amount of potential nominal GPU performance power delivered by the GPU engine.
Each dual-gpu AMD HD RADEON 6990 card is configured with a total of 4 GB GDDR5 memory clocked at 1250MHz, ensuring up to 320 MB/s gpu memory bandwidth, and providing a total of 3072 Stream Processors. OpenCL V 1.1 end user applications can take easily advantage of the whole horsepower generated by such accelerators, especially when combined with additional AMD specialized mathematical libraries and tools such as :
- AMD Accelerated Parallel Processing (APP) SDK (APP-SDK)
- AMD Core Math Library for Graphic Processors (ACML-GPU)
- AMD Accelerated Parallel Processing Math Libraries (APPML)
- AMD APP Profiler
Accessing RADEON-based visualization nodes
The access to one or both Radeon-based visualization nodes is controlled by the SLURM batch queuing system. The access can be interactive or batch depending on the user needs.
Interactive access
From the EIGER frontend node, it is necessary to request the interactive allocation of one of the Radeon-based node using the standard srun SLURM command. The following example shows you how to request exclusively an entire AMD Radeon-based node for 2 hours :
srun -N 1 -n 1 --mem=23g --gres=gpu:2 --constraint=radeon --time=02:00:00
Batch access
The batch access to both AMD Radeon-based nodes for a parallel-MPI job requiring 4 AMD Radeon GPUs, 24 cpu cores, a total of 24 mpi tasks, each one running on a distinguished cpu core, your SLURM batch template script should look like something similar to what is given here below:
======================CUT HERE========================
#!/bin/bash
#SBATCH --job-name="SLURM-RADEON-JOB"
#SBATCH --nodes=2
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=12
#SBATCH --mem=4gb
#SBATCH --gres=radeon
#SBATCH --constraint=gpu:2
#SBATCH --time=01:30:00
#SBATCH --partition=medium
#SBATCH --account=$GROUP
#SBATCH --mail-type=ALL
#SBATCH --mail-user=$USER@MAIL.DOMAIN
#SBATCH --output=/users/$USER/slurm-OUT.log
#SBATCH --error=/users/$USER/slurm-ERR.log
#======START====
#Configure your environment for your running job
. /etc/profile.d/modules.bash
module load pgi
module load mvapich2/1.6-pgi
module load amdappsdk/2.4
module load appml/1.2
module load acmlgpu/1.1.2
echo "On which nodes it executes :"
cat $SLURM_NODELIST
echo "Which MPI Implementation is used :"
which mpiexec.hydra
mpiexec.hydra -info
echo "List the AMD HD RADEON 6990 multi-gpu card operating mode:"
export DISPLAY=:0.0
/usr/bin/aticonfig --pxl
echo "Which GPU devices are available:"
/usr/bin/aticonfig --list-adapters
echo "Check the current GPU core and memory clock frequency in Mhz:"
/usr/bin/aticonfig --od-getclocks
echo "Now run the MPI tasks..."
mpiexec.hydra -n 24 -ppn 12 -bootstrap rsh -bootstrap-exec /usr/bin/rsh -genv MV2_IBA_HCA mlx4_0 -rmk slurm /users/$USER/MY_MPI_APPLICATION
echo "I have selected the first IB HCA :"
/usr/bin/ibv_devices | grep mlx4_0
======================CUT HERE========================
Of course, the SLURM template script above assumes that your parallel-MPI application was compiled using MVAPICH2 V 1.6 built for the PGI Compilers V 11.5, and you are using AMD Mathematical libraries and tools.
AMD Accelerated Parallel Processing (AMD APP SDK)
The AMD APP Software Development Kit (SDK) is a complete development platform created by AMD to allow you to quickly and easily develop applications accelerated by AMD APP technology. The SDK allows you to develop your applications in a high-level language, OpenCL™ (Open Computing Language).
Once you gain the access to one of the 2 AMD Radeaon-based visualization nodes, you should load the necessary AMD APP SDK module environment configuration. This can be done with the command:
- module load amdappsdk/2.4
Then you could start cloning the AMD APP SDK directory tree within your home directory in order to start building and running some code examples :
mkdir ~/EIGER-RADEON
cp -ra /apps/eiger/AMD-APP-SDK-v2.4-lnx64 ~/EIGER-RADEON/
cd ~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64
export DISPLAY=:0.0
make clean;make -k
~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64/bin/x86_64/clinfo
To start testing the several sample codes available within the AMD APP SDK V 2.4, you should change directory into ~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86_64, and then run your preferred sample executable code
cd ~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86_64
./ScanLargeArrays
Further detailed informations can be found on-line under the AMD Developer web site :
http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx
AMD Core Math Library for Graphic Processors (ACML-GPU)
AMD Core Math Library for Graphic Processors (ACML-GPU) provides an ATI Stream-accelerated version of ACML. ACML-GPU accelerates certain routines in ACML, such as SGEMM and DGEMM, by off-loading the computation to the compatible GPUs in the system. The library dynamically decides, based on the parameters passed to the routines, whether to run the computation on the CPU or GPU, depending on which processor will yield the best performance.
Once you gain the access to one of the 2 AMD Radeaon-based visualization nodes, you should load the necessary AMD ACML-GPU module environment configuration. This can be done with the commands:
- module load amdappsdk/2.4
- module load acmlgpu/1.1.2
AMD ACML-GPU can be found on EIGER under
/apps/eiger/acmlgpu1.1.2/
Copy the whole ACML-GPU installation under your home directory and build the GPGPUexamples provided. Try to run some of them on one of the AMD Radeon-based visualization node, eiger205,eiger206:
mkdir ~/EIGER-RADEON
cp -ra /apps/eiger/acmlgpu1.1.2 ~/EIGER-RADEON/
cd ~/EIGER-RADEON/acmlgpu1.1.2/GPGPUexamples
export DISPLAY=:0.0
make clean;make -k
./Info.exe
After the execution of the Info.exe command, you should get an output similar to this one :
===============================================================
CPUID:
function (0)
Vendor: AuthenticAMD
function (1)
Family-Model-Stepping: 16-8-0
Feature flags (EDX): 178BFBFFh
Feature flags (ECX): 00802009h
MMX (EDX bit 13): yes
SSE1 (EDX bit 25): yes
SSE2 (EDX bit 26): yes
SSE3 (ECX bit 0): yes
SSSE3 (ECX bit 9): no
SSE4.1 (ECX bit 19): no
SSE4.2 (ECX bit 20): no
AVX (ECX bit 28): no
function (8000_0004)
Processor Brand: Six-Core AMD Opteron(tm) Processor 2427
function (8000_001A)
Perf Optimization Ids: 00000003h
FP128 (EAX bit 0): yes
MOVU (EAX bit 1): yes
> uname -a
Linux eiger205 2.6.32.29-0.3-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux
> powersave -c
Error org.freedesktop.Hal.Device.PermissionDeniedByPolicy: org.freedesktop.hal.power-management.cpufreq auth_admin_keep_always <-- (action, result)
Unknown policy
CAL RT version: 1.4.1385
CAL CL version: 1.4.1385
gpu0:
Type: CALtarget(15) (unknown type)
Revision: 1
Maximum resource 1D width: 16384
Maximum resource 2D width: 16384
Maximum resource 2D height: 16384
Local GPU RAM: 2048 megabytes
Uncached remote GPU memory: 1530 megabytes
Cached remote GPU memory: 508 megabytes
GPU device clock rate: 830 megahertz
GPU memory clock rate: 1250 megahertz
Wavefront size: 64
Number of SIMDs: 24
Number of shader engines: 2
double precision: Supported
local data share: Supported
global data share: Supported
global GPR: Supported
compute shader: Supported
memexport: Supported
calResCreate pitch alignment: 256 data elements
calResCreate address alignment: 256 bytes
Unaligned Access Views (UAVs): 12
3D program grid: Supported
gpu1:
Type: CALtarget(15) (unknown type)
Revision: 1
Maximum resource 1D width: 16384
Maximum resource 2D width: 16384
Maximum resource 2D height: 16384
Local GPU RAM: 2048 megabytes
Uncached remote GPU memory: 1530 megabytes
Cached remote GPU memory: 508 megabytes
GPU device clock rate: 0 megahertz
GPU memory clock rate: 0 megahertz
Wavefront size: 64
Number of SIMDs: 24
Number of shader engines: 2
double precision: Supported
local data share: Supported
global data share: Supported
global GPR: Supported
compute shader: Supported
memexport: Supported
calResCreate pitch alignment: 256 data elements
calResCreate address alignment: 256 bytes
Unaligned Access Views (UAVs): 12
3D program grid: Supported
GPUs found: 2
===============================================================
AMD Accelerated Parallel Processing Math Libraries (APPML)
AMD Accelerated Parallel Processing Math Libraries are software libraries containing FFT and Level 3 BLAS functions written in OpenCL and designed to run on AMD GPUs. The libraries also support running on CPU devices to facilitate debugging and multicore programming.
Once you gain the access to one of the 2 AMD Radeaon-based visualization nodes, you should load the necessary AMD APPML module environment configuration. This can be done with the commands:
- module load amdappsdk/2.4
- module load appml/1.2
AMD APPML can be found on EIGER under
/apps/eiger/clAmdBlas-1.2.144/
/apps/eiger/clAmdFft-1.2.144/
Copy the whole AMD APPML installation under your home directory and build the samples provided. Try to run some of them on one of the AMD Radeon-based visualization node, eiger205,eiger206:
BLAS:
mkdir ~/EIGER-RADEON
cp -ra /apps/eiger/clAmdBlas-1.2.144 ~/EIGER-RADEON/
export DISPLAY=:0.0
cd ~/EIGER-RADEON/clAmdBlas-1.2.144/samples
cmake .;make
./example_sgemm;example_strmm;example_strsm
FFT :
cp -ra /apps/eiger/clAmdFft-1.2.144 ~/EIGER-RADEON/
export DISPLAY=:0.0
cd ~/EIGER-RADEON/clAmdFft-1.2.144/samples
cmake .;make
./clAmdFft.Client
Remarks: BOOST C++ libraries can be found on EIGER under /apps/eiger/boost_1_46_1/
AMD APP Profiler (APP Profiler)
The AMD APP Profiler is a performance analysis tool that gathers data from the OpenCL run-time and AMD Radeon™ GPUs during the execution of an OpenCL application. We can then use this information to discover bottlenecks in an application and find ways to optimize the application’s performance for AMD platforms. Further detailed information about its usage can be found here :
developer.amd.com/tools/amdappprofiler/pages/default.aspx
and in particular consider also to read the following important related documents :
- AMD APP Profiler User Guide: http://developer.amd.com/tools/AMDAPPProfiler/html/index.html
- AMD APP OpenCL : http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf
and also consider to browse the AMD Developer Central OpenCL Zone :

