9. Tools

9.1 Cray Performance Analysis Tool (CrayPAT)

The Cray Performance Analysis Tool (CrayPAT) helps you analyse the performance of programs running on HECToR systems.

CrayPAT can generate tracing experiments, in which every event specified in the pat_build is logged, or asynchronous sampling experiments, in which the program is periodically polled for general statistics. Tracing experiments typically give more detailed information, but involve more overheads to constantly monitor the requested events compared with sampling experiments. The recommended way to use CrayPAT is to run a sampling experiment initially and use the statistics generated to better inform which events to follow in a tracing experiment. This process is automated with the APA (Automatic Profiling Analysis) option.

For detailed instructions on using CrayPAT and information on interpreting the results please see the Performance Analysis chapter of the HECToR Optimisation Guide. The guide contains a worked example of using CrayPAT to analyse the performance of a program and discusses examples of CrayPAT output.

9.2 Cray Apprentice2

Note: Users can obtain a linux desktop copy of Cray Apprentice2 for use on their local machine, rather than on HECToR, by copying from the /usr/local/packages/dtappr/ directory on HECToR. This desktop version is available on a convenience basis only and is not supported by the HECToR service.

Cray Apprentice2 is a performance data visualisation tool. After you have used pat_build to instrument a program for a performance analysis experiment, executed the instrumented program, and used pat_report to convert the resulting data file to a Cray Apprentice2 data format, you can use Cray Apprentice2 to explore the experiment data file and generate a variety of interactive graphical reports.

To run Cray Apprentice2:

load the perftools module (Cray Apprentice2 is part of this)

module add perftools

run pat_report

pat_report -f ap2 patfile

enter the app2 command to launch Cray Apprentice2

app2 [$--$limit tag_count $\Vert$ $--$limit_per_pe tag_count] [data_files]

Example 1. Cray Apprentice2 basics

This example shows how to use Cray Apprentice2 to create a graphical representation of a CrayPAT report.

Using experiment file program1+pat+2511td from above example, generate a report in XML format (note the inclusion of the -f ap2 option):

module add perftools
pat_report -f ap2 program1+pat+2511td
  Output redirected to: program1+pat+2511td.ap2

Run Cray Apprentice2:

 app2 program1+pat+2511td.ap2
 

Cray Apprentice2 will then display pat_report data in graphical form.

9.3 Totalview debugger

HECToR supports a special implementation of the RogueWave Totalview debugger. Totalview provides source-level debugging of Fortran, C, and C++ code compiled by the Cray, PGI and GNU compilers using between 1 and 2048 compute processes. (for details please see the Cray documentation):

Totalview:

  • Provides both a command line interface (with command line help) and a Motif-based graphical user interface
  • Supports Fortran, C, and C++ code compiled by the Cray, PGI, and GNU compilers
  • Supports programs written in mixed languages
  • HECToR licence supports debugging of up to 2048 compute processes
  • Supports MPI message queue display
  • Supports watchpoints

Please see the Debugging chapter of the HECToR Optimisation Guide for detailed information on running Totalview on HECToR. The Optimisation Guide contains a step-by-step example of using Totalview and example job submission scripts.

9.4 Paraview

9.4.1 Using ParaView 3.12.0 on HECToR

ParaView is a visualization application which has a distributed architecture, allowing a separation between the server (which runs on distributed and shared memory systems using MPI) and the client, which can run on a different machine, and communicates with the server via sockets.

The installation of ParaView has been updated to version 3.12.0 since HECToR moved to phase 3. Additional versions of the server components (pvbatch , pvpython and pvserver) have been built for the compute nodes that are now AMD Interlagos processors. Instructions for using version 3.10.0 are retained lower down this page. The main point to note is that since the changeover to phase 3 the module name has been changed.

To clarify the difference in operation there have been new modules created that are identified as:

  • module load paraview-client/3.12.0
  • module load paraview-servers/3.12.0

The "client" module is the pre-built binary obtained from KitWare and is for use on the login nodes or serial queues. However, the best use of ParaView on HECToR is for processing large data sets and that can be achieved with the "servers".

There are 3 typical modes of use for the software on HECToR (although there are several other modes that are not possible with HECToR)

  1. using the client-only (and its built-in server) to prepare a scene for rendering
  2. using the client with the server to render the scene
  3. use the servers in batch mode to render a large dataset (typically an animation)

The first mode should only be used to work with a small data set in preparation for the second or third mode. Using the client-only pvpython processor will take longer and probably exceed the 20 minute interactive session limit.

The recommendation stands that a small case is used to prepare the instructons for the scene with the Python tracing tool. This can be done using the second mode.

To achieve the second mode it is sensible to have two login shells live. In the first shell record the IP of the host with "hostname -i" for use with the server job script. This shell will be used to run the client so load the paraview-client module. A method for having a generic job script is to setup an environment varible (in these examples we use HOST_IP) to contain that value and pass it to the jobs script through "-v". The job script will be submitted with the second shell so the variable should match of the IP of the machine hosting the client process.

Here is a recipe for operating paraview with the client on a login node and the rendering server using the compute nodes.:

  • record the IP address in an environment variable (e.g. export HOST_IP=`hostname -i`)
  • modify the pvserver.pbs script (example below)
    • so that it has the correct account number
    • if necessary change the number of MPI tasks (here we use only one node)
    • set the port number as explained for 3.10.0
  • submit the pvserver job make a note of the job ID
  • module load paraview-client/3.12.0 [ soon this may become the default but expect it to change as KitWare release newer versions ]
  • on the host (HOST_IP)
    • set XDG_CONFIG_HOME environment variable to point to a directory on the "work partition" that you own (e.g. /work/z03/z03/username in these examples are members of the z03 project)
    • use the paraview client to connect to the server manually
    • the client will wait for the server job to start
    • [another alternative is to use the job submission method described for version 3.10.0]
  • record the session with "Tools > Start Trace"
  • do the paraview work
  • stop recording with "Tools > Stop Trace"
  • save the python script (your choice of name: python_journal.py)
    • NOTE: paraview client writes the full path to the file and may cause a problem if you relocate the script to a different directory. This is also an issue for users who use symbolic link from their home directory to the work directory (consider using an environment variable for that purpose instead).
  • save the paraview state (file > save state: my_state.pvsm)
  • quit paraview
  • delete the server job with:
        qstat -u username
        qdel JOBID

Edit the python script with the additonal lines noted below.

Use mode 3 with the large data set and submit the job to the batch queue with a pvbatch job script

Example pvserver script

This has the additional environment variables. pvserver_n32.pbs

#!/bin/bash --login
#PBS -N pv312_Server
#PBS -l mppwidth=32
#PBS -l mppnppn=32
#PBS -l walltime=01:00:00
#PBS -A z03
#PBS -v HOST_IP

export XDG_CONFIG_HOME=${WORK}/ParaViewIniConfig 
cd $PBS_O_WORKDIR
# this job script is to launch the paraview server as a parallel job
# on one node with 32 cores. It will connect with client running on HOST_IP

module load paraview-servers/3.12.0
module list

if [ -z ${PARAVIEW_SERVER_DIR} ] ; then
  echo "Error: PARAVIEW_SERVER_DIR not set. Exiting"
  exit 4
fi
echo "DEBUG: PARAVIEW_SERVER_DIR is" ${PARAVIEW_SERVER_DIR}
echo "DEBUG: expect to connect to client on host ip address= ${HOST_IP}"

MPPWIDTH=32
MPPNPPN=32
aprun -n ${MPPWIDTH} -N ${MPPNPPN} ${PARAVIEW_SERVER_DIR}/bin/pvserver \
      --use-offscreen-rendering \
      --reverse-connection \
      --server-port=75000  \
      --client-host=${HOST_IP}

echo "End of pvserver script"
#
# END OF SCRIPT 

NOTE: --reverse-connection can be abbreviated to -rc and --client-host can be abbreviated to -ch

Example pvbatch script

This is similar to the pvserver script with changes to the command line that is submitted to aprun. pvbatch_n32.pbs
  
#!/bin/bash --login
#PBS -N CanEx2_Anim
#PBS -l mppwidth=32
#PBS -l mppnppn=32 
#PBS -l walltime=00:20:00 
#PBS -A <account number> 

cd $PBS_O_WORKDIR 
export XDG_CONFIG_HOME=${WORK}/ParaViewIniConfig 
   
module load paraview-servers/3.12.0 
   
if [ -z ${PARAVIEW_SERVER_DIR} ] ; then 
  echo "Error: PARAVIEW_SERVER_DIR not set Exiting" 
  exit 4 
fi 
echo "DEBUG: PARAVIEW_SERVER_DIR is" ${PARAVIEW_SERVER_DIR} 
MPPWIDTH=32 
MPPNPPN=32 
aprun -n ${MPPWIDTH} -N ${MPPNPPN} ${PARAVIEW_SERVER_DIR}/bin/pvbatch \
      --use-offscreen-rendering canex2_anim.py 
  
# END OF SCRIPT 

Additional lines for the Python script

It has been observed that the default state for the pvserver is different to that seen when the paraview client is started. The main difference is that the server uses only the "Headlight" and does not have any background colouring or lighting. The additonal attributes can be set in the python script:

RenderView1.LightSwitch=0 # turns off the headlight
RenderView1.UseLight=1 # turns on the ambient lighting

Of course the exact naming of the object (in this case RenderView1) may vary for your case so take care to examine the python script before submitting it to the pvserver processing. It will be useful to use the Can tutorial and record the session while experimenting with the lighting effects. Even this is not guaranteed to get the view into a state that you saw withh the client so an additional step before closing the client is to save the state and then add the lines to the python script just after the "from python.simple import *" line.

servermanager.LoadState("myjob_state.pvsm")
SetActiveView(GetRenderView())

In the job script you will see that an environment variable XDG_CONFIG_HOME has been set to a directory on the work partition it is an additonal file that is created by the client. This environment variable should be set to the same directory location when using the client to create the python script.

9.4.2 The instructions for versions previous to 3.12.0

To use the version 3.10.0 of ParaView installed on HECToR, first load these modules:

  • module swap PrgEnv-cray PrgEnv-gnu/3.1.49A
  • module load python-shared-xe6/2.6.6
  • module load paraview-client/3.10.0

PrgEnv-gnu and python-shared-xe6 are prerequisites of the paraview module; the module command will complain with message like

paraview/3.10.0(18):ERROR:151: Module 'paraview/3.10.0' depends on one of the module(s)
  'PrgEnv-gnu/3.1.49A PrgEnv-gnu/3.1.37G PrgEnv-gnu/3.1.29 PrgEnv-gnu/3.0.20
   PrgEnv-gnu/3.0.17'
paraview/3.10.0(18):ERROR:102: Tcl command execution failed: prereq PrgEnv-gnu
if you try and load paraview without first loading these prerequisites.

Running pvbatch, the paraview script interpreter

To submit a pvbatch job on HECToR your job script should contain the lines:

#!/bin/bash --login
and
module swap PrgEnv-cray PrgEnv-gnu
module load python-shared-xe6
module load paraview-client/3.10.0
and
aprun  -n $NPROC pvbatch  [name].py

where $NPROC is the number of cores and [name].py is your python submission script. Some examples are given furhter down this document.

A recommended approach to write a Paraview Batch Script is the following:

  1. Download the Paraview on your local machine from here: http://www.paraview.org/paraview/resources/software.html.
  2. Create the pipeline that you want using the ParaView GUI but with a small dataset.
  3. Save the State (File-Save State). The resulting pvsm file can be of valuable help when you write your Paraview Batch Script in Python.
  4. Use pvpython in your local installation to write the script.
  5. Move the script to HECToR and change the input to the whole of the data set you want to render.

Example Paraview Batch Scripts

Example 1

This example is a python script that reads a PLOT3D data set and creates a contour using the Density attribute at value 0.3. The result is written on the file "CF1.png" The relevant data sets can be found here:

from paraview.servermanager import *
Connect()
reader=sources.P3DReader(FileName="combxyz.bin")
reader.QFileName="combq.bin"
view=CreateRenderView()
view.Background=[0.3249412, 0.34902, 0.427451]
rep=CreateRepresentation(reader,view)
rep.Representation=3
rep.Visibility=0

cont1=filters.Contour(Input=reader)
cont1.ComputeScalars=1
cont1.ComputeNormals=0
cont1.ContourValues=[0.3]
cont1.SelectInputScalars=[’0‘,’0‘,’0‘,’0‘,’Density‘]
rep2=CreateRepresentation(cont1,view)
rep2.ColorAttributeType=0
rep2.SelectionColor=[1.0,0,1.0]
rep2.ColorArrayName="Density"
lt=rendering.PVLookupTable()
lt.RGBPoints=[1,0.1381,0.2411,0.7091,255,0.6728,0.1408,0.1266]
lt.ColorSpace=3
rep2.LookupTable=lt
view.StillRender()
view.ResetCamera()
view.StillRender()
view.WriteImage("CF1.png","vtkPNGWriter",1)
Example Paraview Image

Example 2

In the second example the python scripts reads a raw binary file with 3Dimensional data and creates a contour at the values 0.1 and -0.1. A Colour bar is also provided.

from paraview.servermanager import *
Connect()
reader=sources.ImageReader(FilePrefix="phi-120000")
reader.DataByteOrder=1
reader.DataExtent[1]=511
reader.DataExtent[3]=1023
reader.DataExtent[5]=511
reader.DataScalarType=10
view=CreateRenderView()
view.Background=[0.3249412, 0.34902, 0.427451]
rep=CreateRepresentation(reader,view)
rep.Representation=3
rep.ColorAttributeType=0
cont1=filters.Contour(Input=reader)
cont1.ComputeScalars=1
cont1.ComputeNormals=0
cont1.ContourValues=[-0.1,0.1]
cont1.SelectInputScalars="ImageFile"
rep2=CreateRepresentation(cont1,view)
rep2.ColorArrayName="ImageFile"
rep2.SelectionColor=[1,0,1]
rep2.ColorAttributeType=0
lt=rendering.PVLookupTable()
lt.RGBPoints=[-0.1,0.1381,0.2411,0.7091,0.1,0.6728,0.1408,0.1266]
lt.ColorSpace=3
rep2.LookupTable=lt
sb=rendering.ScalarBarWidgetRepresentation()
sb.Position=[0.80,0.15]
sb.Title="ImageFile"
sb.Enabled=1
sb.LookupTable=lt
view.Representations.append(sb)
view.StillRender()
view.ResetCamera()
view.StillRender()
view.WriteImage("PHI-result.png","vtkPNGWriter")
Example Paraview Image

Running the Paraview client on HECToR

To start the ParaView client, do

% paraview

By default, this will connect to the paraview server that ‘ installed on the head node. This is not the most efficient way of using ParaView, and the remainder of this note describes how to connect the client to the server running on the compute nodes.

First, we need the IP address of the machine that the client is running on. Do

%hostname -i

which will return an IP address (e.g. 10.128.0.9). Then export it to an environment variable (e.g. HOST_IP).

export HOST_IP=`hostname -i`

Then create a job submission script (called, say, myjob.pbs ) like this:

#!/bin/bash --login
#PBS -N Para_View
#PBS -l walltime=00:20:00
#PBS -l mppwidth=32
#PBS -l mppnppn=32
#PBS -v HOST_IP

#PBS -A z03

module swap PrgEnv-cray PrgEnv-gnu
module load python-shared-xe6
module load paraview-client/3.10.0

cd $PARAVIEW_SERVER_DIR/bin

aprun -n 32 -N 32 ./pvserver --reverse-connection --use-offscreen-rendering 
--server-port=75000 --client-host=${HOST_IP} 

where the highlighted parameter values must be changed to reflect your environment, or preferences. In particular, note that the IP address of the client host can’t be hardcoded from one login session to another, because that will depend on which of the head nodes you get logged into. It is passed inthrough the environment variable (in this case we used HOST_IP, but it can be your choice of name).

Having created the job script, we need to tell the client about it. In the ParaView client window, choose File , then Connect . A window called Choose Server will appear. Choose Add Server and fill in the following information:

  • Name: any name of your choice
  • Server Type: Choose the option Client / Server (reverse connection)
  • Host: This option disappears when the above option is selected.
  • Port: 75000 (i.e. the same value as that specified for server-port in the job script).

Then select Configure. A window called Configure Server will appear. In the text window, enter the command used to submit your job script, e.g. qsub myjob.pbs (there’s no need to change the default wait time at the bottom of the window). Note that, if paraview wasn’t started in the same directory as the job script, you’ll have to include the path to the file, e.g. qsub /work/z03/z03/jeremyw/jobs/myjob.pbs

Close the window by selecting Save . In the Choose Server window, select the server you’ve just added and select Connect. The job will be be submitted to the batch queue and, when it runs, it will connect back to the client. When the connection is made, the Choose Server window will disappear, and you can start using ParaView to visualize your data.

Note: We have noticed that running a Cygwin X server on your local machine might cause the GUI to crash on HECToR. If you are running Linux locally there should be no problem.

For more information about ParaView visit the ParaView website. The mailing list is particularly useful.

More information on programming the python interface to ParaView is available in the Python_Scripting section on the KitWare web site.

9.5 VampirTrace

VampirTrace consists of a tool set and a runtime library for instrumentation and tracing of software applications. During a program run, VampirTrace generates an OTF trace file which can be analysed and visualised by Vampir (commercial software not available on HECToR).

Usage

To use VampirTrace on HECToR you should load the appropriate VampirTrace and PAPI modules. First load the xt-papi module. If you are using the PGI compilers for your code, then load the VampirTrace module, if you are using the GNU compilers load the VampirTraceGnu module.

Instrumentation

To perform measurements with VampirTrace, the user’s application program needs to be instrumented. In order to enable the instrumentation, the user needs to replace the compiler and linker commands with VampirTrace’s wrappers.

Compiler Wrappers

All the necessary instrumentation of user functions, MPI and OpenMP events is handled by VampirTrace’s compiler wrappers (vtcc, vtcxx, vtf90 and vtf77).

In the makefile used to build the application, all compile and link commands should be replaced by the VampirTrace compiler wrapper. The wrappers perform the necessary instrumentation of the program and link the suitable VampirTrace library.

For instance, use the following wrappers to trace MPI functions: :

C: vtcc -vt:cc cc -vt:mpi
C++: vtcxx -vt:cxx CC -vt:mpi
F90: vtf90 -vt:f90 ftn -vt:mpi
F77: vtf77 -vt:f77 ftn -vt:mpi
Runtime Measurement

Before running your code, you will need to export in your batch script some environment variables. For example:

  • VT_PFORM_LDIR = < name of local directory which can be used to store temporary trace files >
  • VT_METRICS = < hardware counters > (see info on VampirTrace website [1])

After running the code, if not already done, unify the trace files with:

vtunify < number of cores > < name of executable >

For more information

8. Software libraries | Contents | 10. Software on HECToR