EN Bereichsnavigation EN

AMD HD RADEON 6990 Multi-GPU Nodes NEW

Introduction

The Visualization, Research & Development Cluster EIGER offers 2 visualization nodes, eiger[205-206], providing each one a dual-gpu AMD HD RADEON 6990 model card, and the necessary software development kit to take advantage of the considerable amount of potential nominal GPU performance power delivered by the GPU engine.

Each dual-gpu AMD HD RADEON 6990 card is configured with a total of 4 GB GDDR5 memory clocked at 1250MHz, ensuring up to 320 MB/s gpu memory bandwidth, and providing a total of 3072 Stream Processors. OpenCL V 1.1 end user applications can take easily advantage of the whole horsepower generated by such accelerators, especially when combined with additional AMD specialized mathematical libraries and tools such as :

  • AMD Accelerated Parallel Processing (APP) SDK (APP-SDK)
  • AMD Core Math Library for Graphic Processors (ACML-GPU)
  • AMD Accelerated Parallel Processing Math Libraries (APPML)
  • AMD APP Profiler

Accessing RADEON-based visualization nodes

The access to one or both Radeon-based visualization nodes is controlled by the SLURM batch queuing system. The access can be interactive or batch depending on the user needs.

Interactive access

From the EIGER frontend node, it is necessary to request the interactive allocation of one of the Radeon-based node using the standard srun SLURM command. The following example shows you how to request exclusively an entire AMD Radeon-based node for 2 hours :

srun -N 1 -n 1 --mem=23g --gres=gpu:2 --constraint=radeon --time=02:00:00

Batch access

The batch access to both AMD Radeon-based nodes for a parallel-MPI job requiring 4 AMD Radeon GPUs, 24 cpu cores, a total of 24 mpi tasks, each one running on a distinguished cpu core, your SLURM batch template script should look like something similar to what is given here below:

======================CUT HERE========================

#!/bin/bash
#SBATCH --job-name="SLURM-RADEON-JOB"
#SBATCH --nodes=2
#SBATCH --ntasks=24
#SBATCH --cpus-per-task=1

#SBATCH --ntasks-per-node=12

#SBATCH --mem=4gb

#SBATCH --gres=radeon

#SBATCH --constraint=gpu:2

#SBATCH --time=01:30:00

#SBATCH --partition=medium
#SBATCH --account=$GROUP
#SBATCH --mail-type=ALL
#SBATCH --mail-user=$USER@MAIL.DOMAIN
#SBATCH --output=/users/$USER/slurm-OUT.log
#SBATCH --error=/users/$USER/slurm-ERR.log

#======START====

#Configure your environment for your running job

. /etc/profile.d/modules.bash
module load pgi
module load mvapich2/1.6-pgi
module load amdappsdk/2.4
module load appml/1.2
module load acmlgpu/1.1.2

echo "On which nodes it executes :"
cat $SLURM_NODELIST
echo "Which MPI Implementation is used :"
which mpiexec.hydra
mpiexec.hydra -info

echo "List the AMD HD RADEON 6990 multi-gpu card operating mode:"
export DISPLAY=:0.0
/usr/bin/aticonfig --pxl
echo "Which GPU devices are available:"
/usr/bin/aticonfig --list-adapters
echo "Check the current GPU core and memory clock frequency in Mhz:"
/usr/bin/aticonfig --od-getclocks

echo "Now run the MPI tasks..."

mpiexec.hydra -n 24 -ppn 12 -bootstrap rsh -bootstrap-exec /usr/bin/rsh -genv MV2_IBA_HCA mlx4_0 -rmk slurm /users/$USER/MY_MPI_APPLICATION

echo "I have selected the first IB HCA :"
/usr/bin/ibv_devices | grep mlx4_0

======================CUT HERE========================


Of course, the SLURM template script above assumes that your parallel-MPI application was compiled using MVAPICH2 V 1.6 built for the PGI Compilers V 11.5, and you are using AMD Mathematical libraries and tools.

AMD Accelerated Parallel Processing (AMD APP SDK)

The AMD APP Software Development Kit (SDK) is a complete development platform created by AMD to allow you to quickly and easily develop applications accelerated by AMD APP technology. The SDK allows you to develop your applications in a high-level language, OpenCL™ (Open Computing Language).

Once you gain the access to one of the 2 AMD Radeaon-based visualization nodes, you should load the necessary AMD APP SDK module environment configuration. This can be done with the command:

  • module load amdappsdk/2.4

Then you could start cloning the AMD APP SDK directory tree within your home directory in order to start building and running some code examples :

    mkdir ~/EIGER-RADEON
    cp -ra
/apps/eiger/AMD-APP-SDK-v2.4-lnx64 ~/EIGER-RADEON/
    cd ~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64
    export DISPLAY=:0.0
    make clean;make -k
    ~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64/bin/x86_64/clinfo

To start testing the several sample codes available within the AMD APP SDK V 2.4, you should change directory into ~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86_64, and then run your preferred sample executable code

     cd ~/EIGER-RADEON/AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86_64

     ./ScanLargeArrays

Further detailed informations can be found on-line under the AMD Developer web site :

      http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx

AMD Core Math Library for Graphic Processors (ACML-GPU)

AMD Core Math Library for Graphic Processors (ACML-GPU) provides an ATI Stream-accelerated version of ACML. ACML-GPU accelerates certain routines in ACML, such as SGEMM and DGEMM, by off-loading the computation to the compatible GPUs in the system. The library dynamically decides, based on the parameters passed to the routines, whether to run the computation on the CPU or GPU, depending on which processor will yield the best performance.

Once you gain the access to one of the 2 AMD Radeaon-based visualization nodes, you should load the necessary AMD ACML-GPU module environment configuration. This can be done with the commands:

  • module load amdappsdk/2.4
  • module load acmlgpu/1.1.2

AMD ACML-GPU can be found on EIGER under 

      /apps/eiger/acmlgpu1.1.2/

Copy the whole ACML-GPU installation under your home directory and build the GPGPUexamples provided. Try to run some of them on one of the AMD Radeon-based visualization node, eiger205,eiger206:

    mkdir ~/EIGER-RADEON
    cp -ra
/apps/eiger/acmlgpu1.1.2 ~/EIGER-RADEON/
    cd ~/EIGER-RADEON/acmlgpu1.1.2/GPGPUexamples
    export DISPLAY=:0.0
    make clean;make -k
    ./Info.exe

After the execution of the Info.exe command, you should get an output similar to this one :

===============================================================

 CPUID:
   function (0)
      Vendor:                AuthenticAMD
   function (1)
      Family-Model-Stepping: 16-8-0
      Feature flags (EDX):   178BFBFFh
      Feature flags (ECX):   00802009h
      MMX    (EDX bit 13):   yes
      SSE1   (EDX bit 25):   yes
      SSE2   (EDX bit 26):   yes
      SSE3   (ECX bit  0):   yes
      SSSE3  (ECX bit  9):   no
      SSE4.1 (ECX bit 19):   no
      SSE4.2 (ECX bit 20):   no
      AVX    (ECX bit 28):   no
   function (8000_0004)
      Processor Brand:       Six-Core AMD Opteron(tm) Processor 2427
   function (8000_001A)
      Perf Optimization Ids: 00000003h
      FP128  (EAX bit  0):   yes
      MOVU   (EAX bit  1):   yes

> uname -a
Linux eiger205 2.6.32.29-0.3-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux

> powersave -c
Error org.freedesktop.Hal.Device.PermissionDeniedByPolicy: org.freedesktop.hal.power-management.cpufreq auth_admin_keep_always <-- (action, result)
Unknown policy

CAL RT version: 1.4.1385
CAL CL version: 1.4.1385

gpu0:
    Type:                            CALtarget(15) (unknown type)
    Revision:                           1
    Maximum resource 1D width:       16384
    Maximum resource 2D width:       16384
    Maximum resource 2D height:      16384
    Local GPU RAM:                   2048 megabytes
    Uncached remote GPU memory:      1530 megabytes
    Cached remote GPU memory:         508 megabytes
    GPU device clock rate:            830 megahertz
    GPU memory clock rate:           1250 megahertz
    Wavefront size:                    64
    Number of SIMDs:                   24
    Number of shader engines:           2
    double precision:                Supported
    local data share:                Supported
    global data share:               Supported
    global GPR:                      Supported
    compute shader:                  Supported
    memexport:                       Supported
    calResCreate pitch alignment:     256 data elements
    calResCreate address alignment:   256 bytes
    Unaligned Access Views (UAVs):     12
    3D program grid:                 Supported

gpu1:
    Type:                            CALtarget(15) (unknown type)
    Revision:                           1
    Maximum resource 1D width:       16384
    Maximum resource 2D width:       16384
    Maximum resource 2D height:      16384
    Local GPU RAM:                   2048 megabytes
    Uncached remote GPU memory:      1530 megabytes
    Cached remote GPU memory:         508 megabytes
    GPU device clock rate:              0 megahertz
    GPU memory clock rate:              0 megahertz
    Wavefront size:                    64
    Number of SIMDs:                   24
    Number of shader engines:           2
    double precision:                Supported
    local data share:                Supported
    global data share:               Supported
    global GPR:                      Supported
    compute shader:                  Supported
    memexport:                       Supported
    calResCreate pitch alignment:     256 data elements
    calResCreate address alignment:   256 bytes
    Unaligned Access Views (UAVs):     12
    3D program grid:                 Supported

GPUs found: 2

===============================================================

AMD Accelerated Parallel Processing Math Libraries (APPML)

AMD Accelerated Parallel Processing Math Libraries are software libraries containing FFT and Level 3 BLAS functions written in OpenCL and designed to run on AMD GPUs. The libraries also support running on CPU devices to facilitate debugging and multicore programming.

Once you gain the access to one of the 2 AMD Radeaon-based visualization nodes, you should load the necessary AMD APPML module environment configuration. This can be done with the commands:

  • module load amdappsdk/2.4
  • module load appml/1.2

AMD APPML can be found on EIGER under 

      /apps/eiger/clAmdBlas-1.2.144/

      /apps/eiger/clAmdFft-1.2.144/

Copy the whole AMD APPML installation under your home directory and build the samples provided. Try to run some of them on one of the AMD Radeon-based visualization node, eiger205,eiger206:

BLAS:

    mkdir ~/EIGER-RADEON
    cp -ra
/apps/eiger/clAmdBlas-1.2.144 ~/EIGER-RADEON/
    export DISPLAY=:0.0

    cd
~/EIGER-RADEON/clAmdBlas-1.2.144/samples
    cmake .;make
    ./example_sgemm;example_strmm;example_strsm


FFT :

    cp -ra /apps/eiger/clAmdFft-1.2.144 ~/EIGER-RADEON/
    export DISPLAY=:0.0

    cd
~/EIGER-RADEON/clAmdFft-1.2.144/samples
    cmake .;make
    ./clAmdFft.Client

 

 Remarks: BOOST C++ libraries can be found on EIGER under /apps/eiger/boost_1_46_1/

AMD APP Profiler (APP Profiler)

The AMD APP Profiler is a performance analysis tool that gathers data from the OpenCL run-time and AMD Radeon™ GPUs during the execution of an OpenCL application. We can then use this information to discover bottlenecks in an application and find ways to optimize the application’s performance for AMD platforms. Further detailed information about its usage can be found here :

             developer.amd.com/tools/amdappprofiler/pages/default.aspx

and in particular consider also to read the following important related documents :

and also consider to browse the AMD Developer Central OpenCL Zone :