Table presents the execution times of the orbital
computing subroutine for the discussed algorithms. The data were
collected from a short run ( 12 time steps), but which yields more
than 1000 timing points, inside the DMC section using the internal
timer of the code. The total time taken by the DMC computation is also
presented. The electronic system has 1024 electrons and the BC size
is approximately 2.4GB.
As expected, the results show that the SHM algorithm is by far the most efficient since it avoids unnecessary data transfers between tasks. MPI-2S appears to be an acceptable alternative when the amount of data surpasses the available shared memory (this may happen in the case of some disordered models). The weak performance of the MPI one-sided algorithm deserves some further comments provided by David Tanqueray: ''Although at first sight the one-sided MPI features would seem to be asynchronous, there is no requirement in the MPI specifications that they should be implemented as such, and indeed in the MPICH implementation they are all saved up and performed together at the next collective or sync call, where they are handled internally as part of the underlying 2-sided MPI communications design. The SHMEM calls on the other hand are usually performed asynchronously being built directly on top of the XT Portals library which does provide some degree of asynchronous support, and in fact the results show that SHMEM algorithm performance is slightly better than the MPI-2S algorithm''.