Next: Compiler
Up: Benchmarks
Previous: FFT
Contents
Maths Libraries (BLAS)
Much of the time spent in Castep is in the double-precision complex
matrix-matrix multiplication subroutine ZGEMM. The orthogonalisation
and subspace rotation operations both use ZGEMM to apply unitary
transformations to the wavefunctions, and it is also used extensively
when computing and applying the so-called non-local
projectors. Although the unitary transformations dominate the
asymptotic cost of large calculations, the requirement that benchmarks
run in a reasonable amount of time means that they are rarely in this
rotation-dominated regime. The orthogonalisation and diagonalisation
subroutines also include a reasonable amount of extra work, including
a memory copy and updating of meta-data, which can distort the timings
for small systems. For these reasons we chose to concentrate on the
timings for the non-local projector overlaps as a measure of ZGEMM
performance, in particular the subroutine
ion_beta_add_multi_recip_all which is almost exclusively a ZGEMM
operation.
For the BLAS tests, the Pathscale compiler (version 3.0) was used
throughout with the compiler options:
-O3 -OPT:Ofast -OPT:recip=ON -OPT:malloc_algorithm=1 -inline
-INLINE:preempt=ON
Figure 3.2:
Graph showing the relative performance of the ZGEMM provided
by the four maths libraries available to Castep on HECToR for the TiN
benchmark. This benchmark performs 4980 projector-projector overlaps
using ZGEMM. Castep's internal Trace module was used to report the
timings.
|
As can be seen from figure 3.2 Cray's LibSci 10.2.1
was by far the fastest BLAS library available on HECToR, at least for
ZGEMM.
Next: Compiler
Up: Benchmarks
Previous: FFT
Contents
Sarfraz A Nadeem
2008-09-01