Table presents the execution times of the orbital computing subroutine for the discussed algorithms. The data were collected from a short run ( 12 time steps), but which yields more than 1000 timing points, inside the DMC section using the internal timer of the code. The total time taken by the DMC computation is also presented. The electronic system has 1024 electrons and the BC size is approximately 2.4GB.
As expected, the results show that the SHM algorithm is by far the most efficient since it avoids unnecessary data transfers between tasks. MPI-2S appears to be an acceptable alternative when the amount of data surpasses the available shared memory (this may happen in the case of some disordered models). The weak performance of the MPI one-sided algorithm deserves some further comments provided by David Tanqueray: ''Although at first sight the one-sided MPI features would seem to be asynchronous, there is no requirement in the MPI specifications that they should be implemented as such, and indeed in the MPICH implementation they are all saved up and performed together at the next collective or sync call, where they are handled internally as part of the underlying 2-sided MPI communications design. The SHMEM calls on the other hand are usually performed asynchronously being built directly on top of the XT Portals library which does provide some degree of asynchronous support, and in fact the results show that SHMEM algorithm performance is slightly better than the MPI-2S algorithm''.