FFTW Planning

As described earlier, CP2K has an FFT library interface which supports several FFT libraries. Of these, FFTW 3 generally gives the best performance. However, in order to maintain a consistent interface with all the libraries, CP2K does not make use of the facility in FFTW to reuse FFT plans when repeated FFTs are performed on the same arrays (or on arrays of the same layout for FFTW 2). Since planning was done before every FFT, only FFTW_ESTIMATE, the cheapest form of plan was able to be employed.

In a similar way to the earlier optimisations to the FFT routines, the code was modified so that a plan is created once, cached in the fft_scratch, and reused at each time step. In addition, now that the planning is only done at startup, it is possible to use the FFTW_MEASURE and for FFTW 3, FFTW_PATIENT and FFTW_EXHAUSTIVE plan styles. These plans take longer than the estimated plans, but should produce plans that allow individual FFTs to perform better. Because the choice of which plan style is best varies depending on the number of FFTs to be performed (i.e. number of SCF cycles) and even on the particular machine architecture, this choice is exposed to the user via a new input file option, which defaults to FFTW_ESTIMATE. Table 5 shows the runtimes for 2000 3D FFTs using each of the different types of plan. Even using FFTW_ESTIMATE, the code is slightly faster as only a single plan is made, rather than repeatedly creating and destroying plans each time. FFTW_EXHAUSTIVE does not show any benefit here, as the planning step takes around 10 times longer than for FFTW_PATIENT, but only gives a slightly better performance at runtime. For longer runs, the more expensive types of plans are expected to have a bigger benefit.

Table 5: Time and speedup for 2000 3D FFTs using different plan types

	Time(s)	Speedup(%)
Original Code	997
FFTW_ESTIMATE	995	0.2
FFTW_MEASURE	989	0.8
FFTW_PATIENT	975	2.3
FFTW_EXHAUSTIVE	1081