Next: Summary of benchmarking study
Up: NEMO performance
Previous: Removing the land only
Contents
Compiler optimisations
In this section we investigate whether any compiler optimisations can be
used to improve the performance of NEMO. We investigate a number of different
compiler flags for both the PGI and PathScale compilers and investigate
the performance for a 16 by 16 grid running on 221 processors. Tables
5 and 6 shows the results obtained for
the PGI and PathScale compilers respectively.
Table 5:
Runtime for 60 time steps for different compiler flags for the
PGI compiler suite. All tests run with jpni=16, jpnj=16
and jpnij=221.
| Compiler flags |
Time for 60 steps (seconds) |
| -O0 -r8 |
173.105 |
| -O1 -r8 |
169.694 |
| -O2 -r8 |
151.047 |
| -O3 -r8 |
141.529 |
| -O4 -r8 |
144.604 |
| -fast -r8 |
fails on step 6 |
| -fastsse -r8 |
fails on step 6 |
| -O3 -r8 -Mcache_align |
155.933 |
|
Table 6:
Runtime for 60 time steps for different compiler flags using the
PathScale compiler suite. All tests were run with jpni=16,
jpnj=16 and jpnij=221.
| Compiler flags |
Time for 60 steps (seconds) |
| -O0 -r8 |
325.994 |
| -O1 -r8 |
203.611 |
| -O2 -r8 |
154.394 |
| -O3 -r8 |
152.971 |
| -O3 -r8 -OPT:Ofast |
162.148 |
|
Tables 5 and 6 show that the best
performance is obtained using -O3 -r8. More aggressive optimisations
either cause the code to slow down or to break entirely, e.g. fast
or fastsse both cause the code to crash.
Next: Summary of benchmarking study
Up: NEMO performance
Previous: Removing the land only
Contents