Boosting the scaling performance of CASTEP: enabling next generation HPC for next generation science
Dominik Jochym1, Jolyon Aarons2, Keith Refson1, Phil Hasnip2 and Matt Probert2
1Science and Technology Facilities Council,
Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, U.K.
2Department of Physics, University of York, Heslington,
York, YO10 5DD, U.K.
April 02, 2012
This report describes three areas of development to the CASTEP density functional theory code. The main objectives
of the work are to: improve I/O performance, produce a parallel efficiency report for users at run-time and
further develop the band-parallel capability of CASTEP. In particular, more details will be given on the following developments:
- MPI collectives to replace MPI point-to-points for the wave function I/O routines.
For a test case aluminium oxide `2x2' slab (al2x2) containing 5 k-points, 40000 G-vectors and 288 bands, faster reading and writing times will be demonstrated.
- A parallel efficiency report which is now written at the end of every CASTEP run. In addition to providing the basic parallel efficiencies,
the report provides information regarding the parallel decomposition used. This information may also be used to
see whether any further optimisations might be possible. The report is also capable of providing details on
any aspects of the calculation that are particularly important to the parallelisation (e.g. the k-point distribution and G-vector communications).
This will help CASTEP users to take full advantage of the parallel capability of the application and will enable more efficient use on HECToR
and other high-end HPC architectures
- The band-parallel capability of CASTEP was improved by implementing an upgraded "triangular matrix" algorithm. This will provide a useful speedup
to all band-parallel calculations. In particular, the improved method is 1.08 times faster with 256 HECToR Phase 3 cores and 1.16 times faster with 1024 cores.
Currently, the I/O improvements and parallel efficiency report have been incorporated into the main CASTEP 6.1 source repository; the band-parallel improvements will
be available from CASTEP 7.0.