The HECToR Service is now closed and has been superceded by ARCHER.

Preparing DL_POLY_4 for the Exascale

DL_POLY is a general purpose package for classical molecular dynamics simulations led by I.T. Todorov at STFC Daresbury Laboratory. The package is used to model the atomistic evolution of the full spectrum of models commonly employed in the materials science, solid state chemistry, biological simulation and soft condensed-matter communities. The main purpose of the software is to enable the exploitation of large scale MD simulations on multi-processor platforms. This was the fourth Distributed Computational Science and Engineering (dCSE) project for improving the effectiveness of DL_POLY_4 on HECToR.

The overall aims of this project were to:

  • Implement a full mixed-mode OpenMP/MPI version of the code to exploit the shared memory features of HECToR and future many-core architectures. This will enable a wider range of DL_POLY_04 users to make efficient use of more cores on HECToR.
  • Enable simulations with billions of atoms by implementing a 64-bit integer representation within the code. This work will enable the code to work beyond the current limit, which is of the order of 1 billion atoms, and will be demonstrated for sizes up to 8 times this limit.

The overall outcome of this work may be summarised as follows:

  • The link cell and Verlet neighbour list routines were updated to include a second level of parallelism, which is based upon threads with OpenMP.
  • OpenMP was also implemented for the evaluation of both the short and long ranged force terms.
  • Threaded parallelism was developed for the Ewald routines, which use DaFT for the Fast Fourier transforms. This was achieved by firstly implementing calls to ACML for the 1D FFTs and secondly using synchronous communications for threaded parallelism over the vectors in the x, y or z direction (depending on the stage of the 3D transform).
  • The constraint force calculations which are implemented via the SHAKE and RATTLE algorithms, were also updated to use OpenMP parallelism.
  • A benchmark simulation with Liquid Argon (256,000 atoms and cubic cells with side 210.36Å) was performed using up to 2048 cores. Using a 9Å cut off the pure MPI code works fine, however, increasing the cut off to 15Å shows that the use of threads allows the code to exploit more cores.
  • For a representative simulation with Sodium/Potassium Disilicate glass (69,120 particles and cubic cells with side 96.72Å), was chosen since with this case, the force field employs almost all the important non-bonded terms, including not only Van Der Waals and Ewald terms, as would be the case for Sodium Chloride, but also three body terms. Good scalability for up to 2048 cores, using 8 threads per MPI task was observed, with a representative cut off of 12.03Å
  • A final simulation to demonstrate the use of the threaded code was performed for Gramicidin-A in Water. This simulation contained 99,120 atoms with tetragonal cells (a=94.6Å, c=112.7Å and cut off 8Å. In this case, although threading increased performance, this was limited to 2 threads.
  • Overall, the use of threads can both increase the number of cores that DL_POLY can exploit and also increase the performance. In a favourable case, this can be a factor of 5-10. However, the best combination of threads and processes is force field (and computational system) dependent. Therefore, in practice to get the best out of the code it will be necessary to experiment with a few short runs to find the best combination. As experience is gained with the mixed mode code, findings will be documented to help guide in this choice.
  • To future proof the code and enable simulations greater than the order of 500 million atoms, a 64 bit integer kind representation was introduced into the code, only for specific variables concerning the global index.
  • Wherever possible, 32 bit kinds have been left in the code. Promoting every variable to 64 bits could nearly double the memory footprint, which is not acceptable and can be avoided.
  • The new code was demonstrated for a series of 3 runs: Argon (with a markedly reduced cut off) to check the basic code, Sodium Chloride to check the non-bonded terms (with slightly reduced Ewald tolerances) and Water to check the bonded terms (again slightly reduced Ewald tolerances).
  • In each case at least 32,768 cores on HECToR were used with up to 4,298,942,376 particles.
  • These code developments have been introduced to an experimental branch of the DL_POLY repository at CCPForge. Plans are in place to merge them with the main branch in the near future.

Please see PDF or HTML for a report which summarises this project.