Results

In table [[*]] a comparison of different runs of testcase TEST8 is shown for different numbers of cores. Bak XT4 and Bak XE6 corresponds to ``vanilla'' DL_POLY_3 compiled and run on XT4 and XE6 respectively, while Opt link is for DL_POLY with an optimised implementation link_cell_pairs algorithm and run on the XE6. Except for 16 and 512 cores it can be seen that the vanilla code is noticeabaly faster on the XE6 when compared to the XT4. It can also be seen that Opt link is faster than the original code, resulting in a 23% improvement in performance on 512 cores.


Table: Timing comparison of different runs with Bak on XT4, XE6 and optimised link_cell_pairs on XE6
Nb. Procs Bak XT4 Bak XE6 Opt link
16 199.154 218.722 211.924
32 106.790 98.113 95.955
64 63.129 57.494 51.436
128 42.036 39.360 34.150
256 27.471 29.492 23.760
512 22.137 25.951 19.961


Table [[*]] shows the percentage improvement of the optimised code compared to the XT4 and XE6. The improvement due to Opt link increases with the number of cores, reaching 23.08 at 512 processes on the XE6, as noted above. On average, the optimised version of DL_POLY with the optimised link_cell_pairs routine is 10.73 faster than vanilla code on the XT4, and 11.93 faster than on the XE6.


Table: Variation rate of different runs of optimised link_cell_pairs on XE6 with Bak on XT4 and XE6
<#838#> Opt link comp. to  
  Bak XT4 Bak XE6
16 6.41 -3.11  
32 -10.15 -2.2  
64 -18.52 -10.54  
128 -18.76 -13.24  
256 -13.51 -19.44  
512 -9.83 -23.08  
Average -10.73 -11.93  


Valène Pellissier 2011-08-24