Next: Compiler
Up: Benchmarks
Previous: FFT
Contents
Maths Libraries (BLAS)
Much of the time spent in Castep is in the doubleprecision complex
matrixmatrix multiplication subroutine ZGEMM. The orthogonalisation
and subspace rotation operations both use ZGEMM to apply unitary
transformations to the wavefunctions, and it is also used extensively
when computing and applying the socalled nonlocal
projectors. Although the unitary transformations dominate the
asymptotic cost of large calculations, the requirement that benchmarks
run in a reasonable amount of time means that they are rarely in this
rotationdominated regime. The orthogonalisation and diagonalisation
subroutines also include a reasonable amount of extra work, including
a memory copy and updating of metadata, which can distort the timings
for small systems. For these reasons we chose to concentrate on the
timings for the nonlocal projector overlaps as a measure of ZGEMM
performance, in particular the subroutine
ion_beta_add_multi_recip_all which is almost exclusively a ZGEMM
operation.
For the BLAS tests, the Pathscale compiler (version 3.0) was used
throughout with the compiler options:
O3 OPT:Ofast OPT:recip=ON OPT:malloc_algorithm=1 inline
INLINE:preempt=ON
Figure 3.2:
Graph showing the relative performance of the ZGEMM provided
by the four maths libraries available to Castep on HECToR for the TiN
benchmark. This benchmark performs 4980 projectorprojector overlaps
using ZGEMM. Castep's internal Trace module was used to report the
timings.

As can be seen from figure 3.2 Cray's LibSci 10.2.1
was by far the fastest BLAS library available on HECToR, at least for
ZGEMM.
Next: Compiler
Up: Benchmarks
Previous: FFT
Contents
Sarfraz A Nadeem
20080901