Next: Summary of benchmarking study
Up: NEMO performance
Previous: Removing the land only
Contents
Compiler optimisations
In this section we investigate whether any compiler optimisations can be
used to improve the performance of NEMO. We investigate a number of different
compiler flags for both the PGI and PathScale compilers and investigate
the performance for a 16 by 16 grid running on 221 processors. Tables
5 and 6 shows the results obtained for
the PGI and PathScale compilers respectively.
Table 5:
Runtime for 60 time steps for different compiler flags for the
PGI compiler suite. All tests run with jpni=16, jpnj=16
and jpnij=221.
Compiler flags |
Time for 60 steps (seconds) |
-O0 -r8 |
173.105 |
-O1 -r8 |
169.694 |
-O2 -r8 |
151.047 |
-O3 -r8 |
141.529 |
-O4 -r8 |
144.604 |
-fast -r8 |
fails on step 6 |
-fastsse -r8 |
fails on step 6 |
-O3 -r8 -Mcache_align |
155.933 |
|
Table 6:
Runtime for 60 time steps for different compiler flags using the
PathScale compiler suite. All tests were run with jpni=16,
jpnj=16 and jpnij=221.
Compiler flags |
Time for 60 steps (seconds) |
-O0 -r8 |
325.994 |
-O1 -r8 |
203.611 |
-O2 -r8 |
154.394 |
-O3 -r8 |
152.971 |
-O3 -r8 -OPT:Ofast |
162.148 |
|
Tables 5 and 6 show that the best
performance is obtained using -O3 -r8. More aggressive optimisations
either cause the code to slow down or to break entirely, e.g. fast
or fastsse both cause the code to crash.
Next: Summary of benchmarking study
Up: NEMO performance
Previous: Removing the land only
Contents