Although not originally part of the project plan, it became clear early on that runtimes for the largest test case (CNT80) was approaching the 12 hour queue limit set on HECToR (See figure 1). Even with the proposed improvements to the FFT, it was not clear that with increasingly large systems being studied at Sheffield that it would be possible to complete a single step of the calculation in under the twelve hour limit. For example, on 64 cores of HECToR Phase2a, the exc_scf step of CNT80 system takes a total of 8 hours. Closer examination found that this is in fact made up of two sub-steps; firstly an iterative diagonalisation procedure (using ScaLAPACK) taking around 2h20m, followed by the `dft_exchange' step (mainly FFT) taking around 5h40m. While the time taken for the second part would be reduced by the modifications planned in the dCSE project, it was decided to implement a checkpoint/restart mechanism, that allows the two parts of this step of the calculation to be performed independently.
This was implemented by a new flag in the input &control section called split_calculation. If set to zero or omitted, both sections of the calculation would be performed as normal. If set to 1, only the first step (diagonalisation) would run, and if set to 2, the output from the first step would be read in, and the dft_exchange step would run.
It should also be noted that while this part of the calculation is already very expensive, we expect the head_nscf (PH.x) step to be even more expensive based on experience with the smaller two test cases. However, this calculation already has checkpointing built in, and will automatically restart from the most recent saved point if the job is stopped (e.g. by hitting the 12 hour wallclock limit), so is already suitable for running in multiple 12 hour blocks.