nocscombine -f O25-TST_CU30_19580101_19580101_grid_T_0000.nc -d \ votemper -o outputfile.nc
Each run was carried out in batch with the timings reported in table 9 being the best of three runs. The runs were performed consecutively ensuring that the same processing core was used for each. Despite this, considerable variation in runtimes was observed, as much as 100% in some cases. As I/O is a shared resource on the system we have no control over other user activities so these variations are perhaps not surprising.
In table 9, 3.6.2 denotes the release version of netCDF 3.6.2 and uses the version available via the package account on HECToR, e.g. the version accessed via the module load netcdf command. Version 4.0-unopt denotes the Snapshot release dated 29th April compiled with default optimisation (i.e. -O1). 4.0-opt is the same Snapshot release compiled with optimisation set to -O2. 4.0-beta2 denotes the final beta2 version compiled with -O2. 4.0-release denotes the final release version compiled with -O2. 4.0-Cray denotes the version supplied by Cray which became available on HECToR during March 2009. Version 4.0-release-classic is netCDF 4.0 run in classic (i.e. netCDF 3.6.2 style) mode. The * denotes versions which have been compiled using the system version of zlib (version 1.2.1) rather than version 1.2.3.
Examining the results in table 9 we see that netCDF 4.0 clearly outperforms netCDF 3.6.2 both in terms of runtime performance and in terms of the amount of disk space required. The size of the file output by netCDF 4.0 is 731/221 = 3.31 times smaller than that output by netCDF 3.6.2. The runtime difference between the versions (c.f. version 3.6.2 with 4.0-release) is 343.563/85.188 = 4.03. This tells us that the runtime savings do not just result from the reduced file size. It's possible that there are some algorithmic differences between the versions or perhaps the dataset now fits into cache better thus reducing memory latency. The compression and chunking used by netCDF 4.0 may also be improving the performance. Interestingly, the Cray version of netCDF 4.0 is slower (92.203 seconds version 85.188 or 78.188 for the different zlib versions) than any of the versions compiled as part of the dCSE project. Whilst the difference is 17.92% or 8.23% depending on which version of zlib was used this is still significant enough to warrant compiling a local version if your code spends sufficient time in netCDF routines.
The level of optimisation used to compile the netCDF library appears to have minimal effect. The system version of zlib (version 1.2.1), outperforms version 1.2.3. However, as netCDF 4.0 clearly states that version 1.2.3 or later is required it is potentially risky to use the older version as functionality required by netCDF 4.0 maybe missing.
In order to compare directly the performance of netCDF 3.6.2. and 4.0 we also tested netCDF 4.0 in classic mode. To output classic format using netCDF 4.0 the following changes must be made to the make_global_file4.F90 code:
Comparing the results we see that netCDF 4.0-release in classic mode is approximately 5.8% faster than netCDF 3.6.2. Therefore it's possible that some improvements to the algorithms have been made between versions.
In summary, based on the results obtained from the NOCSCOMBINE code using netCDF 4.0 instead of netCDF 3.6.2 will likely give significant performance improvements for NEMO. The amount of disk space used could be reduced by a factor of 3 and the time taken to write this information to disk could be reduced by a factor of 4. The time taken to compress and uncompress the data at the post-processing stages still needs to be quantified but the early results are promising. Section 9.4 discusses the implementation of netCDF 4.0 in NEMO.