This Distributed Computational Science and Engineering (dCSE) project was to improve the scaling of the density functional theory code Castep for more than 1000 cores.

  • Parallel efficiency for Castep now stands at around 42% for 1024 cores. This is nearly four times better than the original Castep 4.2.
The following improvements were successfully implemented :
  • Storage and workload of the dominant parts of a Castep calculation were split using basic band parallelism in addition to the existing parallelisation scheme.
  • Matrix Inversion and Diagonalisation was parallelised instead of being performed serially.
  • The original band optimiser required frequent, expensive orthonormalisation steps, so this was replaced with a Band-Independent Optimiser.
The parallel performance of Castep has been dramatically improved by this work.
  • The original parallel efficiency of Castep 4.2 reduced from 86% for 64 cores to around 12% for 1024 cores.

Please see PDF or HTML for a report which summarises this project. More detailed versions of the report are also available: PDF, HTML. There is still scope for further optimisation leading on from this work and this could be implemented in a future dCSE project.