Results

In table [] a comparison of different runs of testcase TEST8 is shown for different numbers of cores. Bak XT4 and Bak XE6 corresponds to ``vanilla'' DL_POLY_3 compiled and run on XT4 and XE6 respectively, while Opt link is for DL_POLY with an optimised implementation link_cell_pairs algorithm and run on the XE6. Except for 16 and 512 cores it can be seen that the vanilla code is noticeabaly faster on the XE6 when compared to the XT4. It can also be seen that Opt link is faster than the original code, resulting in a 23% improvement in performance on 512 cores.

Table: Timing comparison of different runs with Bak on XT4, XE6 and optimised link_cell_pairs on XE6

Nb. Procs	Bak XT4	Bak XE6	Opt link
16	199.154	218.722	211.924
32	106.790	98.113	95.955
64	63.129	57.494	51.436
128	42.036	39.360	34.150
256	27.471	29.492	23.760
512	22.137	25.951	19.961

Table [] shows the percentage improvement of the optimised code compared to the XT4 and XE6. The improvement due to Opt link increases with the number of cores, reaching 23.08 at 512 processes on the XE6, as noted above. On average, the optimised version of DL_POLY with the optimised link_cell_pairs routine is 10.73 faster than vanilla code on the XT4, and 11.93 faster than on the XE6.

Table: Variation rate of different runs of optimised link_cell_pairs on XE6 with Bak on XT4 and XE6

<#838#>	Opt link comp. to
<#838#>		Bak XT4	Bak XE6
16	6.41	-3.11
32	-10.15	-2.2
64	-18.52	-10.54
128	-18.76	-13.24
256	-13.51	-19.44
512	-9.83	-23.08
Average	-10.73	-11.93

Valène Pellissier 2011-08-24