Getting Started on HECToR

Contents

This guide is intended to be used by newcomers to the HECToR service as a tour of essential information and a reference to other more detailed documentation.

Connecting to HECToR

To connect to HECToR, use:

 ssh -X username@login.hector.ac.uk
where "username" should be replaced by your username for the service. This will log you into one of the "login nodes" of the service, from where you perform such actions as editing files, compiling code and submitting parallel jobs to the "compute nodes", which are only accessible via the batch system (more on this later).

For more details see the connecting section of the User Guide.

File and Resource Management

The currency used for charging jobs on the HECToR service is called an AU (Allocation Unit). Your project will have a finite number of AUs according to your access class and the requirements specified during application to use the service. Every job you submit on HECToR will be charged a number of AUs. You will be charged for each node your job uses.

There are two filesystems - home and work. Home is the filesystem you log into, which is backed up and should be used to store source code, input files and important results. However, the home filesystem is not visible to the compute nodes. Only the work filesystem (which is a high performance parallel filesystem) is visible to the compute nodes, but this is not backed up. Therefore, in order to run jobs you must copy input files to the work filesystem /work/xyz/xyz/username (where xyz is your project code) and run your jobs from there, but make sure not to leave important files on work.

For more information on resource management see the User Guide and the Good Practice Guide on IO.

Using Modules

HECToR uses environment modules, a package for managing user environments. Software such as compilers, libraries, tools and some widely used application codes are available to users via modules. Thus, in order to use a particular version of a particular piece of software you must first load its module. On log-in to the system a default collection of modules are available.

For more details on using modules see the User Guide.

Running Jobs

Jobs are submitted to the compute nodes through the PBS batch scheduler. The easiest way to submit a job is using a job script such as the following (for the XE6):

#!/bin/bash --login
#PBS -N My_job
#PBS -l mppwidth=512
#PBS -l mppnppn=32
#PBS -l walltime=04:30:00
#PBS -A budget
  
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=1
aprun -n 512 -N 32 ./executable

You should replace "budget" by your project code. This job requests 512 processes (mppwidth) and declares that it will use 32 (mppnppn) cores per node. It also requests 4hrs 30mins. If the job exceeds this it will be terminated. If the job runs for less than this time you will only be charged for the time used. The line cd $PBS_O_WORKDIR ensures that the job is run from the directory it was submitted from. OMP_NUM_THREADS=1 sets the number of threads per process to use. We recommend always setting OMP_NUM_THREADS explicitly in your job scripts because Cray's libsci library is built with OpenMP enabled by default. The command on the next line, aprun, actually launches the job on the compute nodes, using 512 processes and 32 processes per core.

See the User Guide for more details on writing job scripts, including how to run hybrid jobs. Also see "man aprun" for advanced options for launching jobs.

Compiling

In order to compile for the compute nodes, ALWAYS use the commands ftn for Fortran or cc for C. These commands are wrapper scripts for the compiler module you have loaded in your environment, and they make sure you link against the correct MPI and maths libraries (Cray's libsci). There is no need to explicitly link against MPI or libsci.

There are three main compilers available when compiling for the compute nodes: Cray (loaded as the default module on log-in), GNU and PGI. Each compiler should be loaded via the PrgEnv-xxx module. For example, to switch from the default PGI compiler to the GNU compiler: "module swap PrgEnv-pgi PrgEnv-gnu".

For information about compiling natively for the login nodes, compiling OpenMP codes, and for details about how to select particular compiler versions, see the User Guide. For tips on compiler optimisation flags see the Serial Optimisation Good Practice Guide.

Libraries

One way of selecting the best algorithms and tuned implementations is to make use of libraries. We recommend the use of libsci (Cray's maths library for BLAS/LAPACK/ScaLAPACK, see "man libsci"), FFTW (Fast Fourier Transforms), the NAG libraries (for various problems) and NetCDF and HDF5 IO libraries.

These are all available on HECToR as modules. See the User Guide for details.

Tools

The parallel profiling tool on HECToR is CrayPAT (Cray's Performance Analysis Toolkit) and the parallel debugging tool is Totalview. Both tools are available as modules. See the Performance Measurement Good Practice Guide, the Debugging Good Practice Guide and the User Guide for advice about using these tools.

Applications

The Third Party Software section (use your HECToR username and password to access) in the HECToR User Wiki contains information about all of the centrally-installed applications on HECToR. Some of these applications are free to access, but others require a licence (if you have a licence, contact the helpdesk (support@hector.ac.uk) to request access).

Getting Help

If you have a question that is not/cannot be answered by the documentation (either the User Guide, the various Good Practice Guides, or the FAQs) then please email support@hector.ac.uk.

Fri Aug 2 09:39:14 BST 2013