next up previous contents
Next: Other work Up: Conclusions and future work Previous: Conclusions and future work   Contents

Summary of work and conclusions

Two different versions of NEMO (2.3 and 3.0) have been compiled and tested on HECToR. The performance of these versions has been investigated and an optimum processor count suggested based on the researchers' requirements for job turn-around. The performance of both the PathScale and PGI compilers has been investigated along with an investigation of how the performance varies with the choice of compiler flags. The NEMO code has been found to scale up to 1024 processors with the best performance in terms of runtime versus AU usage being obtained between 128 and 256 processors. Running NEMO in single core mode is found to be up to 18.59% faster than dual core mode, however, the reduction is not sufficient to warrant the increased AU usage. The choice of grid dimensions has been investigated and is found to be optimal for square grids. Where square grids are not possible choosing the dimensions such that the number of cells in the horizontal direction is less than the number in the vertical direction (i.e. choosing jpni < jpnj within the code) gave the best performance. Removal of the land only squares from the computations gave significant reductions to the AU usage, by as much as 25% at larger processor counts. The runtime was also found to decrease, albeit by a lesser extent. Profiling of the code suggests that NEMO spends a considerable amount of time in initialisation and file I/O and thus any reduction that can be made in this area will be beneficial.

NetCDF 4.0, HDF5 1.8.1, zlib 1.2.3 and szip have been installed and tested as part of this project. Initially, beta releases were used until the final release versions became available in June 2008. NetCDF 4.0 is found to give a considerable reduction to both the amount of I/O produced and the time taken in I/O when using the NOCSCOMBINE tool. In addition, the version of netCDF 4.0 installed as part of this project is found to be between 8-20% faster than that installed centrally (via modules) on the system.

NEMO has been converted to use netCDF 4.0 for its main output files resulting in a reduction in output file size of up to 3.55 times relative to the original netCDF 3.X code. For the test model no significant runtime improvement is observed. It is expected that a real research type run should benefit more due to the different frequency of output involved. The restart files have not been converted to use netCDF 4.0.

The BASIC nested model has been compiled and tested and problems with the time step interval identified and rectified. The performance of the BASIC nested model has been investigated with the optimal processor count (in terms of AU usage per time step) found to be 32. The more complex MERGED nested model has not yet run successfully on HECToR. The code compiles but crashes due to the velocity becoming extremely large and NaN values occurring. Various compiler and debugger problems were experienced making identifying the reason for this crash very problematic. These issues have been reported to Cray (HECToR queries Q29941 and Q22386 both described within the report) and are currently awaiting resolution.


next up previous contents
Next: Other work Up: Conclusions and future work Previous: Conclusions and future work   Contents