next up previous contents
Next: Multi GPU Up: Porting to GPU(s) Previous: Porting to GPU(s)   Contents

Direct density matrix solution

Therefore, the solution was to use the cards for what they were good at, namely large matrix multiplication (dgemm)3, with an algorithm for obtaining the density matrix only through multiplication and no diagonalisation.

The basic idea behind the method is as follows. The Hamiltonian matrix and the eigenvalue diagonal matrix are representations of the same operator but in different bases. Exactly the same transformation links the occupancy function and the density matrix. The occupancy is a function of the eigenvalues and for the simple case of insulators it is a Heaviside step function positioned anywhere within the band gap. If a fast converging polynomial approximation is found for the occupancy function and its position (Fermi level) is known a priori, then evaluating the polynomial directly on the Hamiltonian by only using matrix multiplications and scaling will result in the full density matrix. A suitable polynomial may be generated by recursively applying $ f(X) = X^2 (3I-2X)$[8]. Each iteration brings the resultant polynomial closer to the step function. For water and titania approximately 7 iterations are enough to converge $ \rho$ to within machine precision, from the one computed through diagonalisation. There are two matrix multiplications per iteration and a serial diagonalisation is roughly 5-6 times slower than a matrix multiplication in the same conditions. In the end a full density matrix obtained from a single GPU card was about 1.5-2 times faster than one obtained from the fast Xeon E5-2690 2.90Hz machine. A number of polynomials were found or developed and tested and the most efficient and convenient is the sign function approximation: $ X_{n+1} = \frac{1}{2} X_n\left(3I-X_n^2\right)$ with a subsequent flip and shift of the resulting array. It is desirable to pick the chemical potential closer to the centre because the order of the polynomial will be lower and the number of operations will be reduced.



Subsections
next up previous contents
Next: Multi GPU Up: Porting to GPU(s) Previous: Porting to GPU(s)   Contents
DP 2013-08-01