Next: Structure constants on GPUs
Up: Direct density matrix solution
Previous: Direct density matrix solution
Contents
An attempt was made to cautiously use source code from pdgemm and its associated routines to develop an MPI parallel matrix multiplication routine
using cublasDgemm, but this did not perform well. Eventually, a custom block column distributed routine did show an acceptable speedup over a single
card. This was then developed to duplicate the full matrix on all devices by using OpenMPI 1.7b's ability to work with GPU memory addresses directly.
DP 2013-08-01