One of the difficulties scientists face in running any code on a massively
parallel HPC machine is to exploit the parallelism efficiently. There is usually no means to
determine in advance, nor to evaluate in retrospect the efficiency of any given parallel run.
A series of benchmarking runs on different processor counts may be used to evaluate parallel
efficiency and suggest a useful processor count for production runs. While somewhat burdensome,
this approach is useful in the case where many separate near-identical runs are to be performed.
In other cases, where the runs belonging to a project differ substantially, it is impractical.
The problem is particularly acute for plane-wave DFT calculations where several distinct
parallelisation schemes (FFT, band, kpoint) are used simultaneously and the efficiency is
far from a regular monotonic function of processor count.
For this part of the work, CASTEP's various operations have been grouped into "communication" and "calculation" classes, and using the internal timing data a parallel efficiency report is written at the end of every run.