One-sided data transfers (MPI-1S, SHMEM)

This is a variation study of MPI-2S that tries to avoid the synchronisation delays of the previous algorithm using one-sided data transfer provided by MPI or SHMEM libraries. In this case the task that reaches the orbitals computation sector can access the set of BC it needs from the memory of the associated tasks without the need of a matching call on their side. This algorithm has two drawbacks: i)the amount of data to transfer between two tasks is 64 times larger than in the case of the orbital transfer, ii) the data set to be transferred has non-contiguous memory addresses because it is a $ 4\times 4\time 4\times 4$ block of the spatial grid.


Table: Execution times in seconds for BC sharing algorithms described in Sec [*]. The columns are organised as follow: 2DC shows data for runs done on 2 tasks using one dual core processors, 4DC is for runs of 4 tasks on 2 dual core processors and 4QC for 4 tasks using one quadcore processor. The OPO column shows the time spent for one particle orbital computation, DMC is the time for the whole diffusion Monte Carlo computation.
CPU 2DC 4DC 4QC
  OPO DMC OPO DMC OPO DMC
SHM 130 921 - - 139 882
MPI-2S 371 1184 806 1458 546 1249
MPI-1S 562 1380 1669 2565 1430 2210
SHMEM 210 975 771 1759 536 1271