Realspace to Planewave transfer

The first major work item in the project was to optimise the rs2pw_transfer routine (steps II and V in figure 1). As mentioned earlier, CP2K maintains two distinct bases - products of atom-cented gaussian functions to repesent the wave-function (stored as a sparse matrix), and planewaves (stored as distributed regular grids). In common with other DFT codes, CP2K performs a Self-Consistent Field (SCF) loop to find the ground state energy of the system. Each iteration therfore requires to convert back and forth between these two representations via the following steps (illustrated in figure 1):

**Figure 1:** Transformation from gaussians to plane waves via real space grids
$\includegraphics[width=10cm]{images/grids.ps}$

Steps I,VI: The matrix elements corresponding to products of gaussian functions are mapped onto a set of realspace multi-grids. Each level of the multi-grids may be fully distributed, so each MPI task is responsible for mapping the subset of gaussians which are centred on it's particular section of the grid, or replicated, in which case any task may map any given gaussian. These properties are important for the load balancing scheme described in section 5. This step is referred to as `collocation', and when done in reverse `integration'.
Steps II,V: The data stored on the realspace multi-grids is then transferred to a set of planewave multi-grids. The planewave grids in general are distributed differently to the realspace grids (either in slices or pencils, in order to suit a parallel FFT), so this step involves using MPI_Alltoallv to globally reshuffle the data into the right place on the distributed planewave grids. In addition to this there is a `halo-swap' step required (before the redistribution for the realspace-to-planewave direction and afterwards for the planewave-to-realspace direction). This is necessary because when a gaussian function is mapped to the grid, it may extend beyond the boundaries of grid which are local to the MPI task that performed the mapping. Therefore every process maintains a halo region which is wide enough to accommodate the largest possible gaussian, and after mapping is complete, these halo regions are swapped with the neighbours and summed into their local grid.
Steps III,IV: Once the data is on the planewave grids, a parallel 3D Fourier Transform is done to move from a realspace to recriprocal space representation. Further detials of the FFT algorithm is in section 4.

Subsections