next up previous contents
Next: Diagonalisation with ScaLAPACK and Up: Parallelise serial segments of Previous: Direct space Hamiltonian   Contents

Parallelisation

The most efficient parallelisation in view of the above developments was to let each process, of the passed 2D communicator, handle a contiguous sequence of atoms, and not communicate anything globally until the end of the $ k$-point/spin loop. Afterwards a reduce operation may then be performed across the 1D communicators followed by a gather operation for the vector quantities. For the scalars the reduce is then performed on the 3D communicator. The 2D array quantity $ \partial \rho^S_{RR'}$ is only required in rare cases and is treated as a rather thick block vector. Inside the 2D communicators the scheduling is similar to the one used for the structure constants, for the generally smaller 2D communicator. This is static by the external routine with a view on spreading the effort evenly, rather than the number of atoms.



DP 2013-08-01