Two intrinsically parallel optimisation methods are available in DL-FIND [6], namely a genetic algorithm and stochastic search. These methods are typically used for finding the global minimum on a potential energy surface. Both methods involve calculations on a population of structures during each cycle, which can be run in parallel.
The genetic algorithm and stochastic search methods use the same parallel DL-FIND interface that was used as the basis for parallelising the NEB method. To support the new methods the interface was modified on the ChemShell side to pass the relevant input options to DL-FIND.
A change to the handling of shell model systems in ChemShell was also required to ensure that the parallel optimisations would work with these systems if desired. The DL-FIND optimiser only uses atom positions and ignores shells, which are a feature specific to ChemShell. Previously the shell positions were stored in ChemShell as absolute coordinates and relaxed starting from the old positions when a new geometry was created. This relaxation can be slow or difficult to converge if the geometry changes radically, which is often the case for the global optimisation routines, where a wide variety of geometries is considered at every step. The shell handling in ChemShell was improved by storing shell positions as relative coordinates, so they would stay near to their parent atoms even under a large change of geometry. This change benefits all the other optimisation methods in DL-FIND as well, but is most important for the global optimisers.
Following Ref. [19], benchmark calculations were performed on ZnO nanoclusters. In Ref. [19] rigid ion MM calculations were used but for the purpose of benchmarking a much more demanding QM calculation was set up using GAMESS-UK. Timing calculations were performed on a (ZnO) cluster. The B97-1 functional was again used with a PVDZ basis (560 basis functions) and the Stuttgart ECP for the Zn atoms. A population of 32 structures was used, which is a typical size for these methods and is an efficient number for task-farm parallelisation.
|
Normally a genetic algorithm or stochastic search optimisation would be run for hundreds or thousands of cycles in order to have a good chance of finding the global minimum. However, due to computational expense it was not feasible to run a baseline calculation of this length using a single workgroup. To obtain a benchmark the stochastic search algorithm was run for 20 cycles instead. This is expected to give a good representation of the scaling behaviour as each cycle contains the same number of evaluations and should therefore take approximately the same amount of time.
Procs | Workgroups | Procs/ | Time / s | Speed-up |
workgroup | vs 1024 | |||
1024 | 1 | 1024 | 23535 | |
256 | 1 | 256 | 26560 | |
1024 | 2 | 512 | 14881 | 1.6 |
1024 | 4 | 256 | 8930 | 2.6 |
1024 | 8 | 128 | 5819 | 4.0 |
1024 | 16 | 64 | 4270 | 5.5 |
1024 | 32 | 32 | 4197 | 5.6 |
Speed-up factors for the task-farmed calculations are therefore given compared to the 1024-processor run. Gains in performance are again substantial, with the best performance as expected given by maximising the number of workgroups, although the performance of the 16 workgroup calculation is very nearly as good, with speed-ups of over 5 achieved in both cases.
Tom Keal 2010-06-29