Increasing the level of optimisation from -O0 to -O2 gives an increase in performance. Optimisation of -O2 up to -O4 gives minimal improvement. The -fast flag results in a segmentation violation. As this flag invokes a number of different optimisations we tested each of these in turn to ascertain which particular flags cause the problem. The command pgf90 -help -fast lists the optimisations invoked by -fast, e.g.
fionanem@nid15879:~> pgf90 -help -fast Reading rcfile /opt/pgi/7.1.4/linux86-64/7.1-4/bin/.pgf90rc -fast Common optimizations; includes -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline == -Mvect=sse -Mscalarsse -Mcache_align -Mflushz
The -Munroll=c:1 flag enables loop unrolling which c:1 ensuring that all loops with a length of 1 or more are completely unrolled. The -Mnoframe flag prevents the compiler from generating code which fits in a stack frame. The -Mlre flag allows loop carried redundancy elimination to occur - i.e. variables redundant within a loop are removed. The -Mautoinline flag automatically enables function inlining in C/C++ and thus does not apply to NEMO. The -Mvect=sse flag allows vector pipelining to be used with SSE instructions. The -Mscalarsse flag generates scalar SSE code with xmm registers - this flag also implies -Mflushz. The -Mcache_align flag ensures that objects are aligned along cache boundaries. The -Mflushz flag sets the SSE instructions to ``flush-to-zero'' which ensures that numbers approaching zero get automatically zeroed.
From Table 10 we see that the addition of the flags -Mlre and -Mvect=sse cause the code to crash at runtime. All other flags invoked by -fast do appear to not cause significant issues. The -Mlre causes the zonal velocity to become very large suggesting that the loop redundancy elimination may have removed a loop temporary that was actually required. The reason for the failure when -Mvect=sse is added is unknown. Ultimately the addition of the additional flags doesn't give significant performance improvements over -O2 or -O3 and thus -O3 will be used in future.