For easy development and testing of the FFT routines the first task of the project was to isolate the FFT code in a simple benchmark code. In order to minimise the amount of code needing to be written for this test harness, a new executable was created based on the PW.x code. Here, the main FFT routine `tg_cft3s` is wrapped by a routine `cft3s`. This was replaced with code which rather than execute a single FFT as requested, repeatedly performed a fixed number of forward and reverse FFT loops, before finally checking the output, and exiting. This has the effect of allowing the code to start up as normal, reading the normal set of input files, before the first FFT is intercepted and the FFT benchmark is performed.

The wallclock time taken by the FFTs is recorded using `MPI_Wtime` and reported on completion of the benchmark. A test for correctness is made by saving the initial contents of the grid, and comparing this to the final grids. Both the total error (absolute sum of errors in each grid element across all processes) and the maximum single element error are reported. Using the original, unmodified FFT routines and the CNT80 input files a maximum single element error of 10^{-14} was found after 1000 forward and inverse FFTs. This gives an error of 10^{-17} per iteration, which is consistent with the level of accuracy expected from double precision arithmetic. High values of either the single element or total error would flag a warning in the output file, and so the code could easily be tested for correctness during development.