Further Improving NEMO In Shallow Seas (FINISS)
This Distributed Computational Science and Engineering (dCSE) project will develop the NEMO (Nucleus for a European Model of the Ocean) ocean modelling code. NEMO is of great strategic importance for the UK and European oceanographic communities. Although NEMO has been used successfully for a number of years in global and ocean basin applications, its use as a shelf-sea model is less well developed. This was the third dCSE project for developing NEMO on HECToR.
The overall aims of this project were:
- Reduce the bandwidth requirements of 3-dimensional halo exchanges in NEMO by eliminating field values beneath the seabed from the halo messages.
- Develop a tool for off-line generation of "grid-partition" maps, and add an option to NEMO to load grid-partition maps at run-time.
- Determine the deepest level in each processor's sub-domain, and then redevelop the code in the NEMO's standard z-last ordering to restrict the outer loops over the vertical dimension so that levels entirely beneath the seabed are not traversed.
- Improve improve the load-balancing in the z-first ordering, so that loops are performed only for the active levels at each grid-location, thereby eliminating the redundant computations on land and beneath the sea bed that remain after partitioning into sub-domains.
- Benchmark using both deep ocean and shallow-sea test cases.
The individual achievements of the project are summarised below:
- Eliminating field values beneath the seabed from halo messages was successful but resulted in negligible performance improvements on the AMM12 test case which covers the seas around the British Isles.
- Restricting loops to the deepest level in a sub-domain did not eliminate enough layers to yield a significant improvement, due to the vertical co-ordinate scheme used in AMM12.
- Looping over active levels in each sub-domain in the z-last ordering was implemented on some twenty-five source files that collectively account for more than 90% of the run-time (excluding I/O), sufficient to allow the extrapolation of the effect of "dry-point" elimination to 100% coverage; it is estimated that the load balance at 100% coverage will be 90% or better.
- It was found that the z-first ordering gives a modest performance improvement on HECToR; this is due to slightly improved cache-reuse and is despite the fact that the important operation of tridiagonal solves in the vertical dimension does not vectorise in the z-first ordering.
- The potential to reduce the communications cost of 2-dimensional solves was identified, and this is an optimisation that would benefit NEMO in either index-ordering.
- All code changes have been incorporated back into a development branch of the main NEMO source repository.
Please see PDF or HTML for a report which summarises this project.