Performance on HECToR Phase 3 is given in Figure 1 for the benchmark test case mentioned in WP2, which is for a double gyre quasi-geostrophic model. This has 3 layers on a 1025 X 1025 uniform grid. All runs were performed with fully populated HECToR Phase 3 nodes and performance with the default compilers was compared. The particular compiler optimisations were as follows: GCC -O3 -ffast-math; PGI -fast -O3, CCE -O3.
It is clear that best performance is given by GCC. Most compute time is spent performing the CABARET extrapolation step, however, this part of the code scales linearly with very little amount of time spent in communication and is therefore is not a significant cost to the overall performance. Whereas, the second most computationally expensive routine is the inverse discrete sine transform step, which is used at each time step for calculating the stream function via the potential vorticity inversion step. But this does not scale well due to the global transpose steps and thus causes a severe problem in terms of code performance. The reason GCC gives better performance than both PGI and CCE is because the inverse discrete sine transform step has been optimised for GCC.
However, even with GCC, PEQUOD had very limited scalability on HECToR for useful problem sizes. This was a significant concern and the optimisation of this calculation was necessary for further use of the code. The development of a parallelised quasi-geostrophic CABARET code for WP2 could be considered a straightforward task, due to the inherent nature of CABARET for structured grids. However WP3 for the efficient parallelisation of the Helmholtz solver would not be considered so straightforward.
Phil Ridley 2012-10-01