Next: Limitations
Up: Programming
Previous: Castep Implementation
Contents
With the new distributed inversion and diagonalisation subroutines the
performance and scaling of Castep was improved noticeably. As
expected, this improvement was more significant when using larger
number of cores. Figure 5.2 shows the improved
performance of Castep due to the distribution of the matrix inversion
and diagonalisation in this work package.
Figure 5.2:
Graph showing the performance and scaling improvement
achieved by the distributed inversion and diagonalisation work in Work
Package 2, compared to the straight band-parallel work from Work
Package 1. Each calculation is using 8-way band-parallelism, and
running the standard al3x3 benchmark.
|
Figure 5.3:
Comparison of Castep scaling for Work Packages 1 and 2 and
the original Castep 4.2, for 10 SCF cycles of the al3x3
benchmark. Parallel efficiencies were measured relative to the 16 core
calculation with Castep 4.2.
[ num_proc_in_smp : 1]
![\includegraphics[width=0.9\textwidth]{overall_smp1.eps}](img74.png)
[ num_proc_in_smp : 2]
|
The distributed diagonalisation, on top of the basic band-parallelism,
enables Castep calculations to scale effectively to between two and
four times more cores compared to Castep 4.2 (see figure
5.3). The standard al3x3 benchmark can now be
run on 1024 cores with almost 50% efficiency, which equates to over
three cores per atom, and it is expected that larger calculations will
scale better. A large demonstration calculation is being performed
that should illustrate the new Castep performance even better.
Next: Limitations
Up: Programming
Previous: Castep Implementation
Contents
Sarfraz A Nadeem
2008-09-01