next up previous contents
Next: Benchmarking and Performance Up: Band-Parallelism (Work Package 1) Previous: Phonon   Contents

Testing

The changes to many of the Castep modules were trivial, and needed little testing. The basic splitting of the MPI communicator to create a band-group was tested with a modified version of Castep's comms_test program.

The changes to the wave and electronic modules were rather more extensive, and difficult to test in isolation. In order to test and debug these modules, a short Castep calculation was run in serial to create a wavefunction and density in a Castep .check file, and then the calculation was restarted from that checkpoint file in serial and band-parallel modes. Because both jobs started from the same known point the calculations could be compared in detail, right down to individual wavefunction coefficients where necessary. This, coupled with Castep's in-built trace functionality, enabled bugs to be found and fixed quickly.

The ability to compare data in detail, even mid-calculation, proved necessary, since several bugs were found and fixed that did not affect the final, converged result but either hindered convergence, or could give rise to incorrect answers in unusual circumstances.

For a small 8-atom silicon test calculation performed using the density mixing (DM) method, the serial calculation produces

------------------------------------------------------------------------ <-- SCF
SCF loop      Energy           Fermi           Energy gain       Timer   <-- SCF
                               energy          per atom          (sec)   <-- SCF
------------------------------------------------------------------------ <-- SCF
Initial   5.10476316E+002  4.62264099E+001                        15.74  <-- SCF
      1  -7.76802126E+002  2.64224391E+000   1.60909805E+002      25.28  <-- SCF
      2  -8.50574887E+002  2.02770490E-001   9.22159500E+000      33.17  <-- SCF
      3  -8.54801574E+002  3.79693598E-001   5.28335886E-001      44.40  <-- SCF
      4  -8.52981743E+002  7.44988320E-001  -2.27478843E-001      52.26  <-- SCF
      5  -8.52884167E+002  9.08590414E-001  -1.21969434E-002      60.13  <-- SCF
      6  -8.52886334E+002  8.98636611E-001   2.70796284E-004      68.36  <-- SCF
      7  -8.52887081E+002  9.06719344E-001   9.34638588E-005      76.22  <-- SCF
      8  -8.52887250E+002  9.10591664E-001   2.11356795E-005      84.07  <-- SCF
      9  -8.52887250E+002  9.11100143E-001  -3.08962712E-008      88.90  <-- SCF
     10  -8.52887250E+002  9.11105407E-001  -4.65249702E-008      93.76  <-- SCF
     11  -8.52887250E+002  9.11110563E-001  -1.77844141E-008      98.98  <-- SCF
------------------------------------------------------------------------ <-- SCF

and a two-core band-parallel calculation produces

------------------------------------------------------------------------ <-- SCF
SCF loop      Energy           Fermi           Energy gain       Timer   <-- SCF
                               energy          per atom          (sec)   <-- SCF
------------------------------------------------------------------------ <-- SCF
Initial   5.10476316E+002  4.62264099E+001                        16.67  <-- SCF
      1  -7.76802126E+002  2.64224391E+000   1.60909805E+002      28.13  <-- SCF
      2  -8.50574887E+002  2.02770490E-001   9.22159500E+000      38.62  <-- SCF
      3  -8.54801574E+002  3.79693598E-001   5.28335886E-001      52.16  <-- SCF
      4  -8.52981743E+002  7.44988320E-001  -2.27478843E-001      60.98  <-- SCF
      5  -8.52884167E+002  9.08590414E-001  -1.21969434E-002      70.63  <-- SCF
      6  -8.52886334E+002  8.98636611E-001   2.70796284E-004      79.96  <-- SCF
      7  -8.52887081E+002  9.06719344E-001   9.34638588E-005      88.78  <-- SCF
      8  -8.52887250E+002  9.10591664E-001   2.11356795E-005      97.65  <-- SCF
      9  -8.52887250E+002  9.11100143E-001  -3.08962712E-008     102.84  <-- SCF
     10  -8.52887250E+002  9.11105407E-001  -4.65248977E-008     108.05  <-- SCF
     11  -8.52887250E+002  9.11110563E-001  -1.77844503E-008     113.57  <-- SCF
------------------------------------------------------------------------ <-- SCF

Note that the results as reported are identical for the first 9 SCF cycles, and only differ by $O(10^{-14})$eV/atom in the last two cycles, which is the same order as $\epsilon$ for double-precision arithmetic and so may be attributed to different rounding errors for the serial and band-parallel calculations.

This calculation takes longer when run band-parallel compared to the serial calculation, but this is not a cause for alarm - the test system is very small, containing only 16 valence bands, so it is not surprising that the communication overhead outweighs the gains.

The same calculation run using the `all-bands' self-consistent code path yields

------------------------------------------------------------------------ <-- SCF
SCF loop      Energy                           Energy gain       Timer   <-- SCF
                                               per atom          (sec)   <-- SCF
------------------------------------------------------------------------ <-- SCF
Initial   6.83465549E+002                                         10.35  <-- SCF
      1  -7.97053977E+002                    1.85064941E+002      20.90  <-- SCF
      2  -8.48247959E+002                    6.39924773E+000      31.57  <-- SCF
      3  -8.50914193E+002                    3.33279207E-001      42.16  <-- SCF
      4  -8.51618587E+002                    8.80493249E-002      54.31  <-- SCF
      5  -8.52080365E+002                    5.77221874E-002      64.82  <-- SCF
      6  -8.52436527E+002                    4.45203123E-002      75.48  <-- SCF
      7  -8.52663071E+002                    2.83179709E-002      85.99  <-- SCF
      8  -8.52769350E+002                    1.32848145E-002      96.64  <-- SCF
      9  -8.52812636E+002                    5.41075552E-003     107.15  <-- SCF
     10  -8.52829576E+002                    2.11747553E-003     117.69  <-- SCF
     11  -8.52836183E+002                    8.25924796E-004     128.48  <-- SCF
------------------------------------------------------------------------ <-- SCF

in serial, and

------------------------------------------------------------------------ <-- SCF
SCF loop      Energy                           Energy gain       Timer   <-- SCF
                                               per atom          (sec)   <-- SCF
------------------------------------------------------------------------ <-- SCF
Initial   6.83465549E+002                                          7.04  <-- SCF
      1  -7.97053977E+002                    1.85064941E+002      15.95  <-- SCF
      2  -8.48247959E+002                    6.39924773E+000      24.90  <-- SCF
      3  -8.50914193E+002                    3.33279207E-001      33.79  <-- SCF
      4  -8.51618587E+002                    8.80493249E-002      43.01  <-- SCF
      5  -8.52080365E+002                    5.77221874E-002      51.94  <-- SCF
      6  -8.52436527E+002                    4.45203123E-002      61.04  <-- SCF
      7  -8.52663071E+002                    2.83179709E-002      69.93  <-- SCF
      8  -8.52769350E+002                    1.32848145E-002      79.12  <-- SCF
      9  -8.52812636E+002                    5.41075552E-003      87.98  <-- SCF
     10  -8.52829576E+002                    2.11747553E-003      96.76  <-- SCF
     11  -8.52836183E+002                    8.25924796E-004     107.40  <-- SCF
------------------------------------------------------------------------ <-- SCF

in two-core band-parallel. Note that this time there is a small speed improvement for the band-parallel run - this is because the `all-bands' path does more FFTs per SCF cycle than the DM path, and the FFTs distribute trivially among the band-group.

With the basic band-parallelism tested and complete, Castep has been demonstrated to work in band-parallel mode for the EDFT and DM algorithms.

The only known problem outstanding is with the EDFT mode. In the EDFT algorithm the empty bands are optimised non-self-consistently after the full bands have been updated, but at the moment this does not use the same algorithm as the DM code path and so is not band-parallel.


next up previous contents
Next: Benchmarking and Performance Up: Band-Parallelism (Work Package 1) Previous: Phonon   Contents
Sarfraz A Nadeem 2008-09-01