next up previous contents
Next: Analysis Up: Castep Performance on HECToR Previous: Node Usage   Contents


Baseline

We decided to choose the Pathscale 3.0 binary, compiled with the recip_malloc_inline flags (see section 3.2.3) and linked against Cray's Libsci 10.2.13.2 and FFTW3 for our baseline, as this seemed to offer the best performance with the 4.2 Castep codebase.

Figure 3.6: Execution time for the 33 atom TiN benchmark. This calculation is performed at 8 k-points.
\includegraphics[width=0.45\textwidth]{TiN.eps} \includegraphics[width=0.45\textwidth]{TiN_log_time.eps} \includegraphics[width=0.45\textwidth]{TiN_efficiency.eps}

Figure 3.7: Scaling of execution time with cores for the 270 atom Al2O3 3x3 benchmark. This calculation is performed at 2 k-points.

[Execution time] \includegraphics[width=0.45\textwidth]{Al2O3_time.eps} [Efficiency with respect to 16 cores] \includegraphics[width=0.45\textwidth]{Al2O3_efficiency.eps}

Figure 3.8: Breakdown of CPU time for 256 (3.8(a)) and 512 (3.8(b)) cores using 2 ppn, for Castep Al2O3 3x3 benchmark
[CPU time for Castep on 256 cores] \includegraphics[width=14.0cm]{256_orig_craypat.eps}

[CPU time for Castep on 512 cores] \includegraphics[width=14.0cm]{512_orig_craypat.eps}

Figure 3.9: The CPU time spent in the two dominant user-level subroutines and their children, for a 512-core (2 ppn) Castep calculation of the Al2O3 3x3 benchmark
[CPU time spent applying the Hamiltonian in Castep] \includegraphics[width=14.0cm]{512_orig_craypat_applyH.eps}

[CPU time spent preconditioning the search direction in Castep] \includegraphics[width=14.0cm]{512_orig_craypat_nlpot.eps}


next up previous contents
Next: Analysis Up: Castep Performance on HECToR Previous: Node Usage   Contents
Sarfraz A Nadeem 2008-09-01