The nudged elastic band (NEB) algorithm is a method for finding a minimum energy path between two structures. Typically it is used to characterise a reaction path including an energy barrier. The improved-tangent variant of NEB  is available in DL-FIND .
Each NEB optimisation cycle consists of energy and gradient evaluations for a sequence of structures (images) with geometries that sit along a path between the two endpoints. The final NEB gradient is constructed using spring forces that connect the images. However, the gradient calculations for the images are independent and therefore can be evaluated in parallel.
The NEB algorithm in DL-FIND has been parallelised using static load-balancing. An image is assigned to a workgroup if the image number modulo the number of workgroups is equal to the workgroup ID. This method ensures that a particular image is always assigned to the same workgroup, which is important if the external program uses restart files (as is the case for a GAMESS-UK calculation). At the end of each cycle the energies and gradients are shared between all workgroups by an MPI call within DL-FIND.
The first NEB cycle is different from the others in that the workgroups each calculate the gradients in serial along the whole path. This is to help convergence of the external QM program, as the wavefunction of the previous image can be used as a guess for the next. In subsequent cycles the corresponding image from the previous iteration can be used as the guess.
GAMESS-UK was used for the QM calculations with the B97-1 functional. GULP was used to provide MM energies and gradients using the shell model interatomic potential of Ref. . 10 images are used to describe the path, with the two endpoints frozen, giving 8 gradient evaluations in total per cycle.
A single-point energy and gradient evaluation for the test system is actually an iterative cycle of QM and MM calculations. This is because the QM region is polarised by the MM atoms as point charges and the shells of the MM system are polarised in turn by the QM region. The QM/MM gradient must therefore be iterated until converged each time it is calculated.
The NEB benchmark calculations were performed over 50 cycles. This results in an effectively converged path. Continuing on to full convergence was not desired as small numerical differences can lead to variation in the total number of cycles for the optimisation and this would not reflect the intrinsic performance of the parallelisation. The most basic form of the NEB method was used, with no climbing image  and no freezing of intermediate images during the optimisation.
|Procs||Workgroups||Procs/||Time / s||Speed-up||Speed-up|
|workgroup||vs 1024||vs 256|
The improvement offered by the task-forming approach is significant, and as expected using the maximum number of workgroups gives the highest performance. When compared to a single workgroup 1024-processor run a speed-up of over 8 is found, which is only possible because the single-point 128-processor calculation is faster than the 1024-processor equivalent. When compared to the 256-processor NEB run, the speed-up is lower than 8 but still very substantial.
Tom Keal 2010-06-29