## 8. Libraries

A number of numerical libraries are available for use by programmers on the HECToR system. Some of these are provided by Cray, some are in the public domain and have been ported to work with the HECToR system. This section summarises some of what is available and also indicates where further information can be obtained.

### 8.1 Cray Supplied Libraries

A list of the system library modules and their current default versions can be found at:

#### 8.1.1 Cray LibSci

The LibSci and ACML libraries are provided by Cray. LibSci is usually loaded by default with the appropriate PrgEnv module.

The Cray LibSci library includes a number of parallel numerical libraries including ScaLAPACK, BLAS, BLACS, SuperLU and the Iterative Refinement Toolkit (IRT). The routines in LibSci can be called from both Fortran and C programs. LibSci is usually loaded by default with appropriate PrgEnv module.

As the `xt-libsci` module is loaded by default with your PrgEnv module no additional compiler options are
required to link the library routines to your program. To check which version of LibSci is loaded use:

module list

and to check the versions of LibSci available on the system use:

module avail xt-libsci

You should usually use the most recent release.

##### Linear algebra: BLAS and LAPACK

BLAS and LAPACK are standard libraries for performing linear algebra operations. Detailed documentation can be found on the BLAS and LAPACK websites respectively:

The BLAS (Basic Linear Algebra Subprograms) are the basic building blocks for many applications using linear algebra such as LAPACK. BLAS operations are divided into three classes:

- Level 1 BLAS: scalar, vector and vector-vector operations
- Level 2 BLAS: matrix-vector operations
- Level 3 BLAS: matrix-matrix operations

LAPACK contains a very large number of routines which perform serial dense linear algebra computations. In addition, LAPACK includes routines to handle least squares problems, eigenvalue problems and singular value problems. High performance is achieved by using the BLAS library.

##### BLACS

The BLACS, or Basic Linear Algebra Communications Subroutines, are similar to MPI and essentially consist of a communication routines for use with linear algebra applications. The are built on the same layer as MPI, therefore, their performance should be just as good. The BLACS routines are typically used to make linear algebra applications more portable and easier to program. The ScaLAPACK library is built on top of the BLACS routines for this very reason. More information on the BLACS routines can be found on the BLACS website.

##### ScaLAPACK

ScaLAPACK or Scalable LAPACK is the distributed memory version of LAPACK. More information may be found om the ScaLAPACK website.

##### SuperLU

SuperLU is a library for the direct solution of large, sparse non-symmetric systems of linear equations. It is written in C and can be called from C or Fortran. Further details and usage instructions can be found on the SuperLU webite.

##### Iterative Refinement Toolkit

The Iterative Refinement Toolkit (IRT) is a set of routines which solve linear systems in single precision while obtaining solutions which are accurate to double precision. It can only be applied to well conditioned problems. It can give significant performance improvements.

The IRT contains serial and parallel versions of the following linear algebra solvers:

- LU
- Cholesky
- QR

To use IRT just replace calls to LAPACK or ScaLAPACK routines with calls to the appropriate
`irt_` routine. The naming convention used by the IRT is as follows:

irt_FF_TT_SS

where FF can be one of:

`lu`for LU factorisation`po`for Cholesky factorisation`qr`for QR factorisation

TT is either `real` or `complex`; and SS is either `serial` or
`parallel`. For example, `irt_lu_real_serial` corresponds to the serial version of the
IRT LU factorisation routine for real numbers.

Further details of the IRT can be obtained from the `man` pages using:

man intro_irt

#### 8.1.2 AMD Core Math Library (ACML)

The AMD Core Math Library (ACML) routines can be called from both Fortran and C programs. ACML is comprised of:

- BLAS: Basic Linear Algebra Subprograms
- LAPACK: Linear Algebra Package for solving linear equations and eigenvalue problem
- FFT: a set of routines for performing Fast Fourier Transforms
- RNG: a set of Random Number Generators and statistical distribution functions

Detailed documentation of the ACML is available from the AMD Developers Website.

To use ACML you should load the ACML module with:

module add acml

and explicitly link to ACML in your compilation. If you are compiling code without OpenMP you
should add `-lacml` and if you are using OpenMP you should add `-lacml_mp`. (Note:
the Cray compiler enables OpenMP by default so you either need to specify `-h noomp`
and link to `-lacml` or use the OpenMP enabled ACML library.)

To check which version of ACML is loaded use:

module list

The algorithms behind some LAPACK routines in the ACML differ from those in the public domain source code as they have been optimised for the HECToR system. Functionally and numerically these routines conform to the usual LAPACK conventions.

##### Random Number Generators

ACML includes a set of pseudo-random number generators and statistical distribution functions which contains five base generators and twenty-three distribution generators. A distribution generator is a routine that takes variates generated from a base generator and transforms them into variates from a specified distribution, for example the Gaussian (Normal) distribution.

The five base generators supplied with the ACML are: the NAG basic generator, a series of Wichmann-Hill generators, the Mersenne Twister, L'Ecuyer's combined recursive generator MRG32k3a; and the Blum-Blum-Shub generator. In addition users can supply a custom built generator as the base generator for all of the distribution generators.

##### PGI OpenMP issue

There is a problem in the PGI versions of ACML 4.0.1 and ACML 4.0.1a when used within a OpenMP parallel region. The problem is the intermittent printing of the error message:

Error: This program was not built to run on the processor in your system

which is accompanied by the program either halting or hanging. This problem should be fixed in a forthcoming release of the PGI compiler, but for now users are strongly advised to use ACML 3.6.1 instead of the later releases installed on the system if they are using OpenMP or other threaded programming models.

#### 8.1.3 HDF5

The HDF5 library offers a portable data format to write complex data objects. With the latest versions on HECToR the PGI and Gnu compiler suites are supported (earlier versions support PGI only). To use HDF5 you have to load the HDF5 module with:

module add hdf5

Note: Before loading hdf5 you first have to choose your compiler by loading the appropriate PrgEnv module.

For C code you need to add `-lhdf5 -lz` to the link line of your makefile and for Fortran
code you need to add `-lhdf5_fortran -lhdf5 -lz`. Please use the Cray supplied `cc` or
`ftn` scripts to compile and link your application.

For more information on HDF5 please see:

#### 8.1.4 NetCDF

NetCDF allows applications to store array oriented data in a machine independent fashion. On HECToR NetCDF is available for the PGI and Gnu compiler suites. To access NetCDF you have to load the NetCDF module by typing:

module add netCDF

Note: Before loading netCDF you first have to choose your compiler by loading the appropriate PrgEnv module.

After loading the netCDF module HECToR's default version of NetCDF is easily available to you. You should now be able to compile your application without explicitly setting an include path for header files or a library path for the library files. You need to add:

-lnetcdf

to the link line of your makefile. If you require C++ bindings you also have to add:

-lnetcdf_c++

to the link line.

For more information on NetCDF please see:

#### 8.1.5 FFTW

FFTW (Fastest Fourier Transform in the West) is a set of self-optimising Fourier transform routines. Further details can be found on the FFTW website.

HECToR has two versions of FFTW available: 2 and 3. To use FFTW with your code you need to load the appropriate version via the module load command. For example, to load the default version use:

module add fftw

and to load a specific version (for example, 3.1.1) use:

module load fftw/3.1.1

You can check which version you currently have loaded with:

module list

FFTW version 3 is currently the default on HECToR.

Once you have loaded the appropriate version of FFTW you should just compile and link your code as normal. The paths to the appropriate header files and libraries are added automatically when you load the FFTW module.

#### 8.1.6 PETSc

PETSc (Portable, Extensible Toolkit for Scientific computation) is a library providing tools for solving large-scale sparse nonlinear equations.

To use PETSc, simply load the appropriate module:

module add cray-petsc

To load the complex version use:

module add cray-petsc-complex

You can check which version is loaded with:

module list

Once you have loaded PETSc you should just compile and link your code as normal. The paths to the appropriate header files and libraries are added automatically when you load the module.

#### 8.1.7 IOBUF

IOBUF is an I/O buffering library that can reduce the I/O wait time for programs that read or write large files sequentially. IOBUF intercepts I/O system calls such as read and open and adds a layer of buffering, thus improving program performance by enabling asynchronous prefetching and caching of file data.

IOBUF can also gather runtime statistics and print a summary report of I/O activity for each file.

In general, no program source changes are needed in order to take advantage of IOBUF. Instead, IOBUF is implemented by following these steps:

Load the IOBUF module:

module load iobuf

Relink the program.

Set the IOBUF_PARAMS environment variable as needed.

export IOBUF_PARAMS='*:verbose'

Execute the program.

If a memory allocation error occurs, buffering is reduced or disabled for that file and a diagnostic is printed to stderr. When the file is opened, a single buffer is allocated if buffering is enabled. The allocation of additional buffers is done when a buffer is needed. When a file is closed, its buffers are freed (unless asynchronous I/O is pending on the buffer and lazyclose is specified).

Please note if you are using IOBUF to a single shared file, special precautions must be taken to avoid race conditions. The IOBUF library does not do any synchronization between different MPI ranks, so two restrictions are needed to safely use IOBUF when multiple ranks access a single shared file. They are:

- The MPI program must use the same segment size for all the ranks accessing the shared file
- The buffer size of the IOBUF library buffer must be set to be the same size as the segment size.

There is a theoretically possible race condition that could affect data integrity if these restrictions are not followed. This race condition cannot occur when IOBUF is used with fileperprocess I/O, as each rank has its own file in that mode.

### 8.2 NAG Fortran Library, Mark 21

The NAG Fortran Library contains nearly 1,500 routines making it the largest commercially available collection of mathematical and statistical algorithms available. The documentation portal for the Library can be accessed at the NAG website:

Here is a list of some of the main numerical and statistical capabilities of the Library:

- Optimisation, including linear, quadratic, integer and nonlinear programming and least squares problems
- Ordinary and partial differential equations, and mesh generation
- Numerical integration and integral equations
- Roots of nonlinear equations (including polynomials)
- Solution of dense, banded and sparse linear equations and eigenvalue problems
- Solution of linear and nonlinear least squares problems
- Special functions
- Curve and surface fitting and interpolation
- Random number generation
- Simple calculations on statistical data
- Correlation and regression analysis
- Multivariate methods
- Analysis of variance and contingency table analysis
- Time series analysis
- Nonparametric statistics

The use of the NAG Fortran Library on the compute nodes is regulated by the module
`xt-libnagfl` for the PGI compiler.

The `xt-libnagfl` module supports the PGI environment at versions 7.1 and later.

The NAG Fortran Library needs to be linked against a BLAS/LAPACK implementation. On the XT nodes,
users have a choice between ACML (`module add acml`) and XT-LibSci (`module add
xt-libsci`). The NAG library modules can support either and modify the `ftn` script
according to which of the two support modules was active at the time of loading the NAG library
module.

The NAG library modules provide a script `nag_example` (this is nag_example for the
Pathscale module) that will fetch an example program for a given routine, link it against the
library and then run it (on the login node) with appropriate data.

This example loads the module and exercises an example, using the default `xt-libsci`
library for the support functionality:

module load xt-libnagfl nag_example e04ucf

This example changes to the ACML library and exercises the same example:

module unload xt-libnagfl module swap xt-libsci acml module load xt-libnagfl nag_example e04ucf

#### 8.2.1 NAG SMP Library, Mark 21

The NAG SMP Library contains all the functionality of the NAG Fortran Library, with key routines parallelised using OpenMP to run on SMP systems. Full documentation is available at:

Currently the NAG SMP Library is available only for use with the PGI compiler. It must be used along with the ACML Library, thus it cannot be used together with xt-libsci. To use the NAG SMP Library, first execute the following module commands:

module unload xt-libsci module load acml/3.6.1 module load xt-libnagfs

Once you have loaded the NAG SMP Library and ACML you should just compile and link your code as
normal. The paths to the appropriate header files and libraries are added automatically when you load
the module. The `ftn` compiler command will now include the PGI `-mp=nonuma` compiler
flag.