next up previous contents
Next: About this document ... Up: Improving the scalability of Previous: Bibliography   Contents

Detailed Timings for Figures

This appendix contains the measured runtimes of the various benchmarks used to generate the figures throughout the report.


Table 5: Runtimes of 1253 FFT on HECToR Phase 2a (Figure 2)
Cores 32 64 128 256 512 1024 2048 4096
MPI Only 37.7 21.9 11.0 8.1 7.8 8.5 12.4  
2 threads 40.0 18.6 11.5 6.7 4.9 4.6 4.8  
4 threads 44.7 20.3 10.7 6.0 4.4 3.1 2.8 3.3


Table 6: Runtimes of 1253 FFT on Rosa (Figure 3)
Cores 36 72 144 288 576 1152 2304 4608
MPI Only 40.4 29.4 14.7 10.8 12.5 19.0 26.1  
2 threads 39.7 22.9 17.8 8,1 6.9 7.2 11.0  
6 threads 47.3 26.2 13.3 8.3 7.0 4.0 3.5 3.8
12 threads 57.6 32.3 19.5 8.7 5.3 5.1 3.4 3.7


Table 7: Runtimes of RS2PW on Rosa (Figure 4)
Cores 36 72 144 288 576 1152 2304 4608
MPI 21.2 13.1 9.9 7.7 9.7 14.7    
New MPI 20.3 13.3 11.3 8.2 9.0 6.7 9.6  
2 threads 14.2 11.1 8.5 6.2 7.1 6.3 5.8 5.7
6 threads 12.1 10.2 6.7 5.4 5.2 5.3 5.5 3.9
12 threads 19.5 12.6 9.5 6.6 7.2 5.5 5.1 4.0


Table 8: Runtimes of bench_64 on HECToR Phase 2a (Figure 8)
Cores 16 32 64 128 256 512 1024 2048
MPI (Original) 293 166 99 78 60 82 90  
MPI Only 326 185 110 77 63 65 66  
2 threads 376 202 120 75 58 48 60  
4 threads 424 251 144 86 58 58 51 80


Table 9: Runtimes of bench_64 on Rosa (Figure 9)
Cores 36 72 144 288 576 1152 2304 4608
MPI (Original) 151 129 84 124 116 217    
MPI Only 172 133 85 102 92 128    
2 threads 198 122 96 72 74 91    
6 threads 350 207 125 91 66 63 66  
12 threads 527 279 172 112 84 67 63 71


Table 10: Runtimes of bench_64 on HECToR Phase 2b (Figure 10)
Cores 24 48 72 144 288 576 1152 2304 4608 9126
MPI (Original) 190 147 156 127 172 198 330      
MPI Only 211 169 173 133 155 161 232      
2 threads 259 162 142 131 93 116 111 153    
6 threads 335 219 203 146 109 95 82 89    
12 threads   335 281 194 137 109 89 100    
24 threads       366 263 191 155 141 129 140


Table 11: Runtimes of W216 on Rosa (Figure 11)
Cores 72 144 288 576 1152 2304 4608 9216
MPI (Original) 5728 3041 2214 1662 3897      
MPI Only 5694 2964 2137 1623 2335      
2 threads 5810 3419 1881 1383 1086 1047    
6 threads 8230 4439 2842 1907 1323 914 816 854
12 threads 12113 6477 3515 2356 1487 950 701 665


next up previous contents
Next: About this document ... Up: Improving the scalability of Previous: Bibliography   Contents
Iain Bethune
2010-09-14