The HECToR Service is now closed and has been superceded by ARCHER.

Improving the scaling and performance of GROMACS on HECToR using single-sided communications

The GROMACS simulation package is one of the leading biochemical molecular simulation packages in the world and is widely used for simulating a range of biochemical systems and biopolymers. Both the PRACE1 pan-European HPC initiative and the CRESTA2 exascale collaborative projects have identified GROMACS as a key code for effective exploitation of both current and future HPC systems. GROMACS can take advantage of both hybrid MPI/OpenMP programming models. However, the performance of the MPI communications have a large influence on both the scaling of the code and the number of OpenMP threads per task that can be exploited effectively. Improving the inter-task communication will help GROMACS achieve better scaling and parallel efficiency.

The overall aims of this project were:

  • Improve the performance and scaling of GROMACS by enabling the code to run with a lower number of particles per MPI task.
  • Improve the inter-task communication by replacing the calls to standard MPI two-sided point-to-point communication routines with calls to a single-sided communication interface.
  • Enable the use of different single-sided communication libraries; for HECToR, the Cray shared memory access (SHMEM) routines will be used.

The individual achievements of the project are summarised below:

  • A SHMEM implementation of GROMACS 4.6.3 was developed, with support for Cray SHMEM or OpenSHMEM.
  • Developments were concerned with memory allocation, one-sided operations, and communication and control data structures.
  • The domain decomposition (DD), particle decomposition (PD) and particle mesh Ewald (PME) parts of GROMACS were developed for SHMEM.
  • Two different test cases were used for benchmarking the new developments (against the original MPI version). These were solvated alcohol dehydrogenase (ADH) in a cubic unit cell (134,000 atoms), and a water/ethanol mixture (Grappa) with 45,000 atoms.
  • The ADH test case was benchmarked on HECToR with up to 768 MPI tasks (24 nodes) and the Grappa test case with up to 1024 MPI tasks (32 nodes).
  • Results show that the replacement of MPI routines by one-sided equivalents does not affect performance, due to the current limitations in the programming model, which limit the performance benefit of using SHMEM.
  • Several parallel design features in the code have been identified for future development which would improve the effectiveness of SHMEM, in particular communication buffer management, memory allocation, and the areas of contrast between the SPMD/MPMD approaches.
  • A paper on this work, "Introducing SHMEM into the GROMACS molecular dynamics application: experience and results", was presented at the PGAS 2013 conference (in Edinburgh). Recommendations were also given for other PGAS developers and their applications.
  • To integrate the SHMEM work within the main GROMACS development, the code has been added to a new public development branch of GROMACS 4.6.3. A build-time option was also added to the GROMACS makefile which enables the experimental SHMEM support (GMX_SHMEM) to be built for any platform by using the OpenSHMEM implementation.

Please see PDF or HTML for a report which summarises this project.