next up previous
Next: SWT Up: Results Previous: Results

SS3F

Figure 4 shows parallel scaling for a planned simulation using SS3F. The original code is included for reference (open square), although it was necessary to reduce the cores used per node from 32 to 6 in order to run this case. Node count is therefore chosen in preference to core count for the $x$-axis; this reflects the actual resources occupied in running this simulation in a way that core count does not.

Figure 4:
\includegraphics[width=4.0in]{ss3f_scaling2.eps}

The improvement in efficiency - ie. performance at the minimum (192) node count relative to the original code - appears at first glance to be entirely due to the use of all 32 cores per node (it is a factor of approximately 5, not far off 32/6). If true, the contribution expected from the replacement of the original FFT routines - and confirmed for smaller test cases - is absent. However, the 32 AMD Interlagos processors on each node share many resources, notably L3 cache and interconnect bandwidth, so this may not be a fair comparison.

Figure 5:
\includegraphics[width=3.0in]{ss3f_eff.eps}

Scaling to over 12000 cores is efficient, but in this case the same good efficiency does not extend to $>$18000 cores.


next up previous
Next: SWT Up: Results Previous: Results
R.Johnstone 2012-07-31