next up previous contents
Next: Compiler optimisations Up: NEMO performance Previous: Scaling plot   Contents


Removing the land only grid cells

So far we have considered decompositions in which all the grid cells are used, i.e. those where the code has jpnij = jpni x jpnj. However, many decompositions give rise to grid cells which contain only land. These land only cells are essentially redundant in an ocean model and can be removed. In the code this means that the value of jpnij can be reduced such that jpnij <= jpni x jpnj. It is anticipated that removing land only cells may improve the performance of the code as branches into land only regions will no longer take place and any I/O associated with the land cells will also be removed. Furthermore, the number of AU's required will be reduced as fewer processors will be required if the land cells are removed.

The NEMO code does not automatically remove the land cells which means the user needs to use the chosen decomposition and then separately determine how many cells contain only land. A tool written by Andrew Coward can be used to determine the number of active (ocean containing) and dead (land only) cells. The procedure for doing this is as follows:

Table 3 gives the number of land only cells for a variety of grid dimension configurations. The reduction in the number of processors required is generally around 10%. For very large (>256) processor counts the reduction can be considerably larger and as much as 25%.


Table 3: Number of land only squares for a variety of processor grids. The percentage saved gives the percentage of cells saved by removing the land only cells and will correspond to the reduction in the number of AU's required for the computation.
jpni jpnj Total cells Land only cells Percentage saved
6 6 36 0 0.00%
7 7 49 1 2.04%
8 8 64 2 3.13%
9 9 81 6 7.41%
10 10 100 10 10.00%
11 11 121 13 10.74%
12 12 144 14 9.72%
13 13 169 21 12.43%
14 14 196 22 11.22%
15 15 225 29 12.89%
16 16 256 35 13.67%
20 20 400 65 16.25%
30 30 900 193 21.44%
32 32 1024 230 22.46%
40 40 1600 398 24.88%
16 8 128 117 8.59%
32 16 512 92 17.97%


We now investigate whether removing the land only cells has any impact on the runtime of the NEMO code. We hope that by avoiding branches into land only regions and the associated I/O involved with the land cells that the runtime should reduce. For this test we have considered only 128, 256, 512 and 1024 processor grids. The results are given by table 4.


Table 4: Runtime comparison for 60 time steps for models with/without land squares included on 128, 256, 512 and 1024 processor grids.
jpni jpnj jpnij Time for 60 steps (seconds)
32 32 1024 110.795
32 32 794 100.011
16 32 512 117.642
16 32 420 111.282
16 16 256 146.607
16 16 221 136.180
8 16 128 236.182
8 16 117 240.951


From table 4 we can see that for 256 processors and above removing the number of land squares reduces the total runtime by up to 10 seconds which corresponds to a reduction of around 7-10%. For a 128 processors run, removal of the land-only cells actually gives a small increase in the total runtime. This difference is within normal repeatability errors and could be a result of heavy load on the system when the test was run. As the runtime does not seem to improve greatly with the removal of the land only cells the main motivation for removing these cells is to reduce the number of AU's used for each calculation. Assuming the runtime is not affected detrimentally then the reduction in in AU usage will be as given by table 3.

The times given in table 4 are the time that the NEMO code reports when it writes the information from time step 60 to disk. This, however is not the whole story. At the end of the run, NEMO also dumps out the restart files required to restart the computation from the final time step. These restart files are significantly larger than the files output at each individual time step and thus take a reasonable amount of time to write out to disk. Unfortunately the code does not output any timings which include the writing of these restart files. One way to get an estimate of the time taken to write out these restart files is to look as the actual time taken by the parallel run as reported by the batch system. The PBS output files gives the walltime in hh:mm:ss. By subtracting the time taken for 60 steps from walltime we can get an estimate of the time taken over and above the step by step output, i.e. we can get an estimate of the time taken to read in the input data and output the final restart files. To get accurate time estimates timers should be inserted into the code but as a first pass this method will let us find out whether there is any variation with processor count. The amount of time that NEMO spends in I/O and initialisation will be discussed in Section 6.9.


next up previous contents
Next: Compiler optimisations Up: NEMO performance Previous: Scaling plot   Contents