Summary of benchmarking study

Next: Optimal processor count Up: NEMO performance Previous: Compiler optimisations Contents

What have we found out from running these simple benchmarks?

PGI performs consistently better than PathScale with the latest versions of the compilers (PathScale 3.1, PGI 7.2.5) giving almost identical performance.
Running in single core mode will give a reduction in the 60 step runtime but this is more than offset by the increased number of AU's required.
Equal grid dimensions are best and should be used where possible. If equal dimensions can't be used then they should be chosen to be as square as possible and such that jpni < jpnj.
NEMO continues to scale out to 1024 processors but the best performance in terms of runtime versus AU's used is obtained for 128 or 256 processors.
Removal of land squares reduces the runtime for 60 time steps for most processor counts and greatly reduces the number of AU's required. This is not carried out by default in NEMO and thus many researchers could be using more AU's than necessary.
Compiler flags above -O3 don't provide any benefit and in some cases break the code entirely - see section 9 for more details.

Next: Optimal processor count Up: NEMO performance Previous: Compiler optimisations Contents