Next: Other work
Up: Conclusions and future work
Previous: Conclusions and future work
Contents
Two different versions of NEMO (2.3 and 3.0) have been compiled and tested on
HECToR. The performance of these versions has been investigated and an optimum
processor count suggested based on the researchers' requirements for job
turn-around. The performance of both the PathScale and PGI compilers has
been investigated along with an investigation of how the performance varies
with the choice of compiler flags. The NEMO code has been found to scale up
to 1024 processors with the best performance in terms of runtime versus AU
usage being obtained between 128 and 256 processors. Running NEMO in single
core mode is found to be up to 18.59% faster than dual core mode, however,
the reduction is not sufficient to warrant the increased AU usage. The
choice of grid
dimensions has been investigated and is found to be optimal for square grids.
Where square grids are not possible choosing the dimensions such that the
number of cells in the horizontal direction is less than the number in the
vertical direction (i.e. choosing jpni < jpnj within the code)
gave the best performance. Removal of the land only squares from the
computations gave significant reductions to the AU usage, by as much as 25%
at larger processor counts. The runtime was also found to decrease, albeit
by a lesser extent. Profiling of the code suggests that NEMO
spends a considerable amount of time in initialisation and file I/O and thus
any reduction that can be made in this area will be beneficial.
NetCDF 4.0, HDF5 1.8.1, zlib 1.2.3 and szip have been installed and tested
as part of this project. Initially, beta releases were used until the final
release versions became available in June 2008. NetCDF 4.0 is found to give
a considerable reduction to both the amount of I/O produced and the time
taken in I/O when using the NOCSCOMBINE tool. In addition, the version of
netCDF 4.0 installed as part of this project is found to be between 8-20%
faster than that installed centrally (via modules) on the system.
NEMO has been converted to use netCDF 4.0 for its main output files resulting
in a reduction in output file size of up to 3.55 times relative to the original
netCDF 3.X code. For the test model no significant runtime improvement is
observed. It is expected that a real research type run should benefit more
due to the different frequency of output involved. The restart files have not
been converted to use netCDF 4.0.
The BASIC nested model has been compiled and tested and problems with the
time step interval identified and rectified. The performance of the BASIC
nested model has been investigated with the optimal processor count (in
terms of AU usage per time step) found to be 32. The more complex MERGED
nested model has not yet run successfully on HECToR. The code compiles but
crashes due to the velocity becoming extremely large and NaN values
occurring. Various compiler and debugger problems were experienced making
identifying the reason for this crash very problematic. These issues have
been reported to Cray (HECToR queries Q29941 and Q22386 both described
within the report) and are currently awaiting resolution.
Next: Other work
Up: Conclusions and future work
Previous: Conclusions and future work
Contents