next up previous contents
Next: Time spent in file Up: NEMO performance Previous: Summary of benchmarking study   Contents


Optimal processor count

The results presented in Section 6 suggest that all future work on NEMO should be carried out using code compiled with the PGI compiler suite as it gives the lowest runtimes.

The NOCS researchers ideally want to be able to run an entire model year, i.e. 365 model days, in a 12 hour run on HECToR as this enables them to make optimal use of the machine/queues and also allows them to keep up with the post-processing and data transfer of the results as the run progresses. They can currently achieve 300 model days in a 12 hour run using 221 processors. In this section we investigate whether an optimal processor count which satisfies the desire to complete a model year in a 12 hour time slot can be found. To do this NEMO is executed over a range of processors and the number of model days which can be computed in 12 hours, $ndays$, is obtained from:-

\begin{displaymath}
ndays = 43200 / t_{60}
\end{displaymath} (1)

where $43200$ is the number of seconds in 12 hours and $t_{60}$ is the time taken to complete a 60 step (i.e. 1 day) run of NEMO. This means we ideally need $t_{60} <= \frac{43200}{365} = 118.36$ seconds. The processor count investigated varies from 159 to 430. In all tests runs have been performed with the land cells removed. The results of this test are summarised in table 7. Figure 6 shows the results in graphical form with the 365 day threshold marked by the dashed line.


Table 7: Runtime for 60 time steps for various processor configurations ranging from 159 to 430.
jpni jpnj No. of procs Time for 60 steps (seconds)
13 14 159 177.583
14 14 174 163.633
14 15 187 172.191
15 15 196 157.858
15 16 209 153.450
16 16 221 145.078
16 17 232 137.507
17 17 244 127.705
17 18 260 135.688
18 18 274 127.103
18 19 286 122.639
19 19 304 125.880
19 20 321 118.081
20 20 335 117.830
20 21 349 107.464
21 21 364 113.491
21 22 379(380) 114.175
22 22 398(396) 107.051
22 23 413 123.939
23 23 430(429) 110.871


Figure 6: Investigation of optimal processor count for NEMO subject to completing a model year within a 12 hour compute run. The dashed line shows the cut-off point.
Image optimalproc_count

In performing this investigation some problems were discovered relating to the computation of land only cells performed by the nocspmap_r25 code. It was found that several processor configurations yielded incorrect numbers of land cells. These have been highlighted in table 7 where the value which was incorrectly computed is given in ``()'' after the correct number of land cells. If the wrong number of land cells are specified the code fails with an error of the form:-

 ===>>> : E R R O R
         ===========

  Eliminate land processors algorithm
 
   jpni =           21  jpnj =           22                                                                       
                                                                                       
   jpnij =          380 < jpni x jpnj                                                                             
                                                                                       
 
   ***********, mpp_init2 finds jpnij=          379


next up previous contents
Next: Time spent in file Up: NEMO performance Previous: Summary of benchmarking study   Contents