next up previous contents
Next: Structure constants on GPUs Up: Direct density matrix solution Previous: Direct density matrix solution   Contents

Multi GPU

An attempt was made to cautiously use source code from pdgemm and its associated routines to develop an MPI parallel matrix multiplication routine using cublasDgemm, but this did not perform well. Eventually, a custom block column distributed routine did show an acceptable speedup over a single card. This was then developed to duplicate the full matrix on all devices by using OpenMPI 1.7b's ability to work with GPU memory addresses directly.



DP 2013-08-01