The HECToR Service is now closed and has been superceded by ARCHER.

Parallel Algorithms for the Materials Modelling code CRYSTAL

The CRYSTAL ab initio materials modelling code is a program developed by the Computational Science and Engineering Department at STFC, and the theoretical chemistry group at Turin University (Italy). Its function is to compute the electronic structure of a material, and from this many material properties. CRYSTAL is unique in that it can be used to perform Hartree-Fock (HF), Density Functional Theory (DFT) or HF-DFT hybrid calculations with periodic boundary conditions.

CRYSTAL can be run in parallel using either a replicated data strategy (PCRYSTAL), or alternatively a distributed data strategy, (MPP CRYSTAL). MPP CRYSTAL has been shown to scale up to a few thousand processors for systems with 10,000 basis functions or more. CRYSTAL is under continuous development in order to extend its applicability to emerging problems in materials modelling. The code is particularly important for UK materials chemists and the functionality and performance of CRYSTAL continues to be improved. As the size and complexity of the systems of interest increase, it is necessary to continue to develop CRYSTAL to make best use of high performance computing (HPC) resources, such as HECToR.

The key aims of this project were to:

  • Remove the memory bottleneck in the code, caused by a replicated data structure of the HF and density matrix. This will be achieved by replacing these data structures with their irreducible representations in order to give a speed up in relation to multiples of problem size/(2 x symmetry of the system).
  • Implement better control of the block size of distributed data. This will be achieved by reducing the block size from the default value.
  • Implement coarse grained parallelism.
  • Develop memory optimisation and modularisation to include HF-DFT and GW calculations.

The achievements of the project are summarised below:

  • The memory bottleneck in the CRYSTAL code caused by replicated data structure of the Fock and density matrix was removed by replacing those data structures with its irreducible representations. This gives a representative speed up related to the problem size/(2 x symmetry of the system).
  • By reducing the data block size from the default value of 96, to 64 or 32 showed more than a 10% speed-up in the diagonalisation part of the code and a 20% speed up in the back and similarity transform for up to 3,584 cores on HECToR Phase 2b.
  • For large systems the optimal load balance between complex and real k-points can be achieved with CMPLXFAC=3. This enables good scalability up to 4864 cores on Phase 2b.
  • The new command MPP_BLOCK was implemented to enable better control of the block size of distributed data. Reducing the block size from a default value of 96 to 64 or 32 gives more than a 10% speed-up in the diagonalisation part and about 20% speed-up in the back and similarity transform for a system size of rank=19837. This was demnstrated on up to 3584 Phase 2b cores.
  • This work has been introduced within the main CRYSTAL code base.

Please see PDF for a report which summarises this work.