Modifications

For this function the modifications lie in unrolling the loops which evaluate the representation of the charge density on a grid. As discussed in the reference above this involves evaluating a fairly low order cardinal B-spline, values of 6-16 being common (for TEST8 the value is 8). Unfortunately the inner most loop in the implementation has an upper bound which is at most equal to the order of the spline ( in the code), while the outer loops are much longer. The optimised version consists, therefore, of using a "Select Case" statement in order to manually unroll the loop over for the different values maximum values that it can take (up to 8 in this case). This is especially effective as the case construction can be moved into one of the outer loops, thus leaving long inner loops with no conditionals and stride 1 memory access, which are therefore good candidates for vectorisation:







Valène Pellissier 2011-08-24