Next: Benchmark Results Up: Improving the scalability of Previous: Functional Evaluation Contents

Compiler Comparison

An additional objective of the project was to evaluate the performance of the compilers available on the Cray XT for CP2K. At the start of the earlier CP2K dCSE project[1] the Pathscale 3.1 compiler was found to give around 5% greater performance than the PGI (8.0.2) or gfortran (4.3.2) compilers. Around 2 years after these results, new versions of all three compilers are available, as well as the new Cray Compiler Environment (CCE). In addition, the ability of the compilers to handle the mixed-mode OpenMP code was also evaluated.

The H20-64 benchmark, running on 72 cores (6 nodes) of the Cray XT5 `Rosa' was used for this comparised. For this configuration, less than 30% of the runtime is spent in communication, so the performance of the compiled code is strongly dependent on the compiler's ability to generate a well-optimised binary. The results for the MPI-only code are shown in table 4. In contrast to the previous results, the gfortran compiler now produces results that are in fact slightly better than either Pathscale or PGI. Further details of each of the compilers are below:

Table 4: Comparison of compilers on Rosa, using bench_64

Compiler	Optimisation flags	Time(s)
PGI 10.6.0	-fastsse	143.7s
Pathscale 3.2.99	-O3 -OPT:Ofast -OPT:early_instrinsics=ON -LNO:simd=2	139.8s
gfortran 4.4.4	-O3 -ffast-math -funroll-loops -ftree-vectorize	136.1s
crayftn 7.2.4	-O 2 -O ipa1	184.7s

PGI
Although the PGI compiler gives a reasonably well-performing executable, it does suffer from some drawbacks. In particular, it appears to have some difficultly compiling several parts of the code, and in order to achieve correctness for this benchmark 15 seperate source files had to be compiled without optimisation. Even in this case, around 25% of the regression test suite still failed to give correct results. The mixed-mode OpenMP build was also a failure as it generated segfaults at runtime.
Pathscale
The Pathscale compiler is fairly robust for compiling the MPI-only code, and still gives good performance. However, it was not possible to build a working mixed-mode executable. It is hoped that when the Pathscale 3.3 compiler is released this may resolve some of the OpenMP issues as it contains a new, OpenMP 3.0 compliant, implementation.
Gfortran
Gfortran is now the compiler of choice for CP2K. It is well tested by the developer and user community, and now gives performance on a par with, or exceeding the commercial compilers tested. Furthermore, it was the only compiler capable of producing a working mixed-mode executable. The centrally installed CP2K executables on HECToR are now compiled with gfortran.
CCE
The Cray fortran compiler is able to successfully compiler CP2K since version 7.2.4. However, the performance is much poorer ( 35% slower than gfortran), mostly due to poor optimisation of the collocate and integrate kernel routines. At the time of writing, it is not possible to run a mixed-mode code succesfully without disabling some features of DBCSR. However, Cray have been responsive to these issues and it is expected that they will be resolved in a future release of the compiler.

Next: Benchmark Results Up: Improving the scalability of Previous: Functional Evaluation Contents

Iain Bethune
2010-09-14