In this section we present the main features of three solutions
proposed for orbitals' data sharing which are schematically
illustrated in Fig .
A CASINO computation proceeds by moving one walker at a time, the
transition probability at each step depends on the current values of
the Jastrow factor and OPO at the current position of all random
walkers. The orbitals can be represented using various basis sets,
including plane waves, Gaussian, or B-splines. In this report we are
concerned with the representation in B-Splines [5],
which are localised third order polynomials sitting on a
three-dimensional grid in real space which spans the whole physical
system. They share the same properties of plane-waves as being
systematically improvable and unbiased, but they are localised, and as
such a factor more efficient than plane waves: for each point
in space there are always only 64 B-Splines that have non-zero
values. Therefore the evaluation of each orbital requires only the
computation of 64 B-splines, which is much less than the
total number of plane wave functions for a system with a large number
of electrons (the number of plane wave functions scales with the
).
In the program the B-Splines coefficients (BC) are stored in a rank five
array
, where
,
,
,
,
are the number of orbitals, the
number of the of grid points in three spatial directions and the
number of spins, respectively.
The amount of BC needed in computation is determined by two factors: i) For each spin value the number of orbitals must be equal to the number of electrons with that spin, ii) the grid spacing is determined by the precision of the DFT calculation used to obtain the OPO, the higher the precision the finer the grid must be.
The above requirements conspire to create a large amount of BC. For example, if we consider a system with 1000 electrons, split in half spin up, half spin down, we need at least 500 one-particle orbitals for a non-magnetic system since in this case one can use the same set of BC for both spins. The spatial grid can reach or exceed 80 points in each direction, hence, for the previous quoted numbers one needs approximately 2 GB of memory, if the values of BC are stored in double precision, which is close to the maximum available memory per core for the processors used on HECToR.
In the initial algorithm of CASINO each task has a copy of the BC needed to compute the orbitals values. Since the BC sets are identical on each task and their values do not change during computation the obvious solution to the memory problem is to share the data among groups of tasks, especially when the hardware provides shared memory.