Preliminary test calculations were done on both HECToR XT5h (with Cray
LibSci 10.5.0) and a local Sun Workstation with 2
AMD Opteron
2214 (2 Core 2.2GHz) (with ACML for LAPACK and local compilation for
ScaLAPACK). We used aluminium bulk with 32 atoms unit cell, with a
point mesh, Fermi-Dirac smearing with
temperature of 0.001 Ha. In all cases we did a non-self-consistent
calculation on 4 nodes, results are shown in the table below
| Processor Grid | ScaLAPACK Block | Wall Time | ||
| <#3801#> | 1 | 2318.599 | ||
| 2 | ||||
| 4 | ||||
| <#3814#> | 1 | 8794.051 | ||
| 2 | ||||
| 4 |
As the results clearly shows that
point parallelisation has
a significant improvement on calculation speed given the same amount
of resources compared to the original implementation. This
improvement is more apparent for platforms where the ScaLAPACK
libraries are not highly optimised.
Lianheng Tong 2011-03-02