Compiler optimisations

Next: Summary of benchmarking study Up: NEMO performance Previous: Removing the land only Contents

Compiler optimisations

In this section we investigate whether any compiler optimisations can be used to improve the performance of NEMO. We investigate a number of different compiler flags for both the PGI and PathScale compilers and investigate the performance for a 16 by 16 grid running on 221 processors. Tables 5 and 6 shows the results obtained for the PGI and PathScale compilers respectively.

Compiler flags	Time for 60 steps (seconds)
`-O0 -r8`	173.105
`-O1 -r8`	169.694
`-O2 -r8`	151.047
`-O3 -r8`	141.529
`-O4 -r8`	144.604
`-fast -r8`	fails on step 6
`-fastsse -r8`	fails on step 6
`-O3 -r8 -Mcache_align`	155.933

Compiler flags	Time for 60 steps (seconds)
`-O0 -r8`	325.994
`-O1 -r8`	203.611
`-O2 -r8`	154.394
`-O3 -r8`	152.971
`-O3 -r8 -OPT:Ofast`	162.148

Tables 5 and 6 show that the best performance is obtained using -O3 -r8. More aggressive optimisations either cause the code to slow down or to break entirely, e.g. fast or fastsse both cause the code to crash.

Next: Summary of benchmarking study Up: NEMO performance Previous: Removing the land only Contents