We decided to choose the Pathscale 3.0 binary, compiled with the recip_malloc_inline flags (see section 3.2.3) and linked against Cray's Libsci 10.2.13.2 and FFTW3 for our baseline, as this seemed to offer the best performance with the 4.2 Castep codebase.
|
[Execution time]
[Efficiency with respect to 16 cores]
|
[CPU time for Castep on 256 cores]
[CPU time for Castep on 512 cores]
|
[CPU time spent applying the Hamiltonian in Castep]
[CPU time spent preconditioning the search direction in Castep]
|