Sharing large data sets

In this section we present the main features of three solutions proposed for orbitals' data sharing which are schematically illustrated in Fig [*].

A CASINO computation proceeds by moving one walker at a time, the transition probability at each step depends on the current values of the Jastrow factor and OPO at the current position of all random walkers. The orbitals can be represented using various basis sets, including plane waves, Gaussian, or B-splines. In this report we are concerned with the representation in B-Splines [5], which are localised third order polynomials sitting on a three-dimensional grid in real space which spans the whole physical system. They share the same properties of plane-waves as being systematically improvable and unbiased, but they are localised, and as such a factor $ 1/N_e$ more efficient than plane waves: for each point in space there are always only 64 B-Splines that have non-zero values. Therefore the evaluation of each orbital requires only the computation of 64 B-splines, which is much less than the total number of plane wave functions for a system with a large number of electrons (the number of plane wave functions scales with the $ N_e$).

In the program the B-Splines coefficients (BC) are stored in a rank five array $ a(1:N_b, 0:N_{gx}-1,0:N_{gy}-1,0:N_{gz}-1,N_s)$, where $ N_b$, $ N_{gx}$, $ N_{gy}$, $ N_{gz}$, $ N_s$ are the number of orbitals, the number of the of grid points in three spatial directions and the number of spins, respectively.

The amount of BC needed in computation is determined by two factors: i) For each spin value the number of orbitals must be equal to the number of electrons with that spin, ii) the grid spacing is determined by the precision of the DFT calculation used to obtain the OPO, the higher the precision the finer the grid must be.

The above requirements conspire to create a large amount of BC. For example, if we consider a system with 1000 electrons, split in half spin up, half spin down, we need at least 500 one-particle orbitals for a non-magnetic system since in this case one can use the same set of BC for both spins. The spatial grid can reach or exceed 80 points in each direction, hence, for the previous quoted numbers one needs approximately 2 GB of memory, if the values of BC are stored in double precision, which is close to the maximum available memory per core for the processors used on HECToR.

In the initial algorithm of CASINO each task has a copy of the BC needed to compute the orbitals values. Since the BC sets are identical on each task and their values do not change during computation the obvious solution to the memory problem is to share the data among groups of tasks, especially when the hardware provides shared memory.