next up previous contents
Next: Summary of benchmarking study Up: NEMO performance Previous: Removing the land only   Contents


Compiler optimisations

In this section we investigate whether any compiler optimisations can be used to improve the performance of NEMO. We investigate a number of different compiler flags for both the PGI and PathScale compilers and investigate the performance for a 16 by 16 grid running on 221 processors. Tables 5 and 6 shows the results obtained for the PGI and PathScale compilers respectively.


Table 5: Runtime for 60 time steps for different compiler flags for the PGI compiler suite. All tests run with jpni=16, jpnj=16 and jpnij=221.
Compiler flags Time for 60 steps (seconds)
-O0 -r8 173.105
-O1 -r8 169.694
-O2 -r8 151.047
-O3 -r8 141.529
-O4 -r8 144.604
-fast -r8 fails on step 6
-fastsse -r8 fails on step 6
-O3 -r8 -Mcache_align 155.933



Table 6: Runtime for 60 time steps for different compiler flags using the PathScale compiler suite. All tests were run with jpni=16, jpnj=16 and jpnij=221.
Compiler flags Time for 60 steps (seconds)
-O0 -r8 325.994
-O1 -r8 203.611
-O2 -r8 154.394
-O3 -r8 152.971
-O3 -r8 -OPT:Ofast 162.148


Tables 5 and 6 show that the best performance is obtained using -O3 -r8. More aggressive optimisations either cause the code to slow down or to break entirely, e.g. fast or fastsse both cause the code to crash.


next up previous contents
Next: Summary of benchmarking study Up: NEMO performance Previous: Removing the land only   Contents