One subroutine that was not initially planned to be parallelised, but began to show up in the CP2K timing report as other areas of the code were parallelised was the evaluation of the correlation functional. In this case, only the PBE functional [9] was parallelised, but the method should generalise to the other implemented functionals easily if required.

The bulk of the functional evaluation is done as a single loop over the points on the real-space density grids. At each point, the (complicated) calculation of the the functional is performed, and the result written onto a corresponding point on the derivative grids. This loop is trivially parallel since each iteration is entirely independent, so we see very good OpenMP efficiency (93% efficiency with 6 threads, and 74% using all 24 cores on the node), as shown in table 3.