One-sided data transfers (MPI-1S, SHMEM)

This is a variation study of MPI-2S that tries to avoid the synchronisation delays of the previous algorithm using one-sided data transfer provided by MPI or SHMEM libraries. In this case the task that reaches the orbitals computation sector can access the set of BC it needs from the memory of the associated tasks without the need of a matching call on their side. This algorithm has two drawbacks: i)the amount of data to transfer between two tasks is 64 times larger than in the case of the orbital transfer, ii) the data set to be transferred has non-contiguous memory addresses because it is a $4\times 4\time 4\times 4$ block of the spatial grid.

Table: Execution times in seconds for BC sharing algorithms described in Sec

. The columns are organised as follow: 2DC shows data for runs done on 2 tasks using one dual core processors, 4DC is for runs of 4 tasks on 2 dual core processors and 4QC for 4 tasks using one quadcore processor. The OPO column shows the time spent for one particle orbital computation, DMC is the time for the whole diffusion Monte Carlo computation.

CPU	2DC		4DC		4QC
	OPO	DMC	OPO	DMC	OPO	DMC
SHM	130	921	-	-	139	882
MPI-2S	371	1184	806	1458	546	1249
MPI-1S	562	1380	1669	2565	1430	2210
SHMEM	210	975	771	1759	536	1271