Next: Optimal processor count
Up: NEMO performance
Previous: Compiler optimisations
Contents
What have we found out from running these simple benchmarks?
- PGI performs consistently better than PathScale with the latest
versions of the compilers (PathScale 3.1, PGI 7.2.5) giving almost
identical performance.
- Running in single core mode will give a reduction in the 60 step
runtime but this is more than offset by the increased number of AU's
required.
- Equal grid dimensions are best and should be used where possible. If
equal dimensions can't be used then they should be chosen to be as square
as possible and such that jpni < jpnj.
- NEMO continues to scale out to 1024 processors but the best performance
in terms of runtime versus AU's used is obtained for 128 or 256
processors.
- Removal of land squares reduces the runtime for 60 time steps for most
processor counts and greatly reduces the number of AU's required. This
is not carried out by default in NEMO and thus many researchers could
be using more AU's than necessary.
- Compiler flags above -O3 don't provide any benefit and in
some cases break the code entirely - see section 9 for
more details.
Next: Optimal processor count
Up: NEMO performance
Previous: Compiler optimisations
Contents