next up previous contents
Next: 6. Final Thoughts Up: castep_performance_xt Previous: 4. Distributed Diagonaliser and   Contents

Subsections


5. Independent Band Optimisation
(Work Package 3)

5.1 Introduction

The bottleneck in large Castep calculations is the explicit S-orthonormalisation of the eigenstates. This orthonormalisation involves the calculation and inversion of the band-overlap matrix, operations which scale as $N_pN_b^2$ and $N_b^3$ respectively, where $N_p$ is the number of plane-wave basis states and $N_b$ is the number of bands (eigenstates). Furthermore, when operating in band-parallel mode the former operation is also a communication bottleneck, as the individual eigenstates reside on different processing elements.

Clearly it is desirable to implement an optimisation scheme which will allow the approximate bands to be optimised without the need for an explicit S-orthonormalisation.

5.2 Performance

Unfortunately neither the RMM-DIIS optimiser nor our variation proved to be either robust or quick; the reduction in orthonormalisations reduced the SCF cycle time considerably, but vastly more SCF cycles were needed for convergence. The RMM-DIIS scheme in particular suffered from severe numerical instabilities near convergence, since the residual matrix becomes more and more singular as the trial eigenstates approach the true eigenstates.

In order to ensure only the direct changes to the optimiser were observed, we ran Castep for a fixed density. Typical convergence for a simple magnesium oxide test case using the usual Castep algorithm is:

------------------------------------------------------------------------ <-- SCF
SCF loop      Energy           Fermi           Energy gain       Timer   <-- SCF
                               energy          per atom          (sec)   <-- SCF
------------------------------------------------------------------------ <-- SCF
Initial  -4.95078326E+003  5.20975146E+001                         2.99  <-- SCF
      1  -5.59753549E+003  7.89244217E+000   8.08440297E+001       3.90  <-- SCF
      2  -5.66226988E+003  7.15740116E+000   8.09179761E+000       4.68  <-- SCF
      3  -5.66301246E+003  7.16625993E+000   9.28225593E-002       5.76  <-- SCF
      4  -5.66306881E+003  7.16423308E+000   7.04427727E-003       7.01  <-- SCF
      5  -5.66306893E+003  7.16423173E+000   1.49140438E-005       8.41  <-- SCF
      6  -5.66306893E+003  7.16423137E+000   4.02077714E-007       9.87  <-- SCF
      7  -5.66306893E+003  7.16423137E+000   5.27220802E-008      11.01  <-- SCF
      8  -5.66306893E+003  7.16423137E+000   1.76063159E-009      11.94  <-- SCF
      9  -5.66306893E+003  7.16423137E+000   3.90757352E-010      12.61  <-- SCF
     10  -5.66306893E+003  7.16423137E+000   1.33410476E-011      13.10  <-- SCF
     11  -5.66306893E+003  7.16423137E+000   5.99380402E-012      13.53  <-- SCF
------------------------------------------------------------------------ <-- SCF

Switching to the RMM-DIIS gave

------------------------------------------------------------------------ <-- SCF
SCF loop      Energy           Fermi           Energy gain       Timer   <-- SCF
                               energy          per atom          (sec)   <-- SCF
------------------------------------------------------------------------ <-- SCF
Initial  -4.95078326E+003  5.20975146E+001                         2.85  <-- SCF
      1  -5.59753549E+003  7.89244217E+000   8.08440297E+001       3.69  <-- SCF
      2  -5.66226988E+003  7.15740116E+000   8.09179761E+000       4.41  <-- SCF
      3  -5.66301246E+003  7.16625993E+000   9.28225593E-002       5.39  <-- SCF
      4  -5.66306881E+003  7.16423308E+000   7.04427727E-003       6.55  <-- SCF
      5  -5.66306892E+003  7.16423445E+000   1.34939453E-005       7.66  <-- SCF
      6  -5.66306891E+003  7.16423645E+000  -1.18066010E-006       8.76  <-- SCF
      7  -5.66306893E+003  7.16423715E+000   2.90858466E-006      10.06  <-- SCF
      8  -5.66306893E+003  7.16424036E+000   5.70679748E-008      11.11  <-- SCF
      9  -5.66306893E+003  7.16424784E+000  -9.96659399E-008      12.21  <-- SCF
     10  -5.66306890E+003  7.16426887E+000  -3.48059116E-006      12.98  <-- SCF
     11  -5.66306893E+003  7.16499317E+000   3.20356934E-006      13.78  <-- SCF
     12  -5.66306893E+003  7.16435180E+000  -1.03932948E-007      14.46  <-- SCF
     13  -5.66306893E+003  7.16439686E+000  -1.62527990E-007      15.22  <-- SCF
     14  -5.66306892E+003  7.16467568E+000  -2.95401151E-007      15.88  <-- SCF
     15  -5.66306891E+003  7.16448445E+000  -1.60189845E-006      16.58  <-- SCF
     16  -5.66306891E+003  7.16566473E+000   5.03725090E-008      17.30  <-- SCF
     17  -5.66305809E+003  7.16892722E+000  -1.35318489E-003      17.94  <-- SCF
     18  -5.66289950E+003  7.17878051E+000  -1.98232364E-002      18.59  <-- SCF
     19  -5.66295014E+003  7.20703280E+000   6.33023123E-003      19.23  <-- SCF
     20  -5.65353849E+003  7.27226034E+000  -1.17645706E+000      19.82  <-- SCF
------------------------------------------------------------------------ <-- SCF

Even with this small test case there was a slight improvement in the SCF cycle time, but the numerical instabilities caused the solution to diverge eventually. Our modified algorithm proved slightly more stable for this test case, but slower and also showed signs of diverging:

------------------------------------------------------------------------ <-- SCF
SCF loop      Energy           Fermi           Energy gain       Timer   <-- SCF
                               energy          per atom          (sec)   <-- SCF
------------------------------------------------------------------------ <-- SCF
Initial  -4.95078326E+003  5.20975146E+001                         3.25  <-- SCF
      1  -5.59753549E+003  7.89244217E+000   8.08440297E+001       5.65  <-- SCF
      2  -5.66226988E+003  7.15740116E+000   8.09179761E+000       6.44  <-- SCF
      3  -5.66301246E+003  7.16625993E+000   9.28225593E-002       7.51  <-- SCF
      4  -5.66306881E+003  7.16423308E+000   7.04427727E-003       8.79  <-- SCF
      5  -5.66306892E+003  7.16423445E+000   1.34668025E-005      10.08  <-- SCF
      6  -5.66306891E+003  7.16423645E+000  -1.24884522E-006      11.33  <-- SCF
      7  -5.66306825E+003  7.16423517E+000  -8.20412820E-005      12.79  <-- SCF
      8  -5.66306852E+003  7.16424032E+000   3.31933376E-005      13.98  <-- SCF
      9  -5.66306886E+003  7.16424765E+000   4.29159705E-005      15.26  <-- SCF
     10  -5.66306888E+003  7.16426863E+000   2.20859478E-006      16.17  <-- SCF
     11  -5.66306892E+003  7.16496512E+000   5.82997827E-006      17.23  <-- SCF
     12  -5.66306892E+003  7.16434686E+000  -2.59482603E-007      17.99  <-- SCF
     13  -5.66306892E+003  7.16439357E+000  -5.01087043E-007      18.80  <-- SCF
     14  -5.66306891E+003  7.16466770E+000  -1.12797108E-006      19.56  <-- SCF
     15  -5.66306890E+003  7.16447879E+000  -1.59632306E-006      20.42  <-- SCF
     16  -5.66306886E+003  7.16561534E+000  -4.04882177E-006      21.16  <-- SCF
     17  -5.66306881E+003  7.16881083E+000  -6.32000384E-006      21.89  <-- SCF
     18  -5.66306867E+003  7.17899059E+000  -1.85164167E-005      22.64  <-- SCF
     19  -5.66306845E+003  7.20738379E+000  -2.73264238E-005      23.33  <-- SCF
     20  -5.66306769E+003  7.29973182E+000  -9.40002692E-005      24.05  <-- SCF
------------------------------------------------------------------------ <-- SCF

These results were fairly typical of the performance of these optimisers-it was relatively straightforward to get them close to the groundstate, but difficult to get the accuracy we require. Imposing orthonormality on the updates enabled both methods to converge quickly and robustly, indicating that this poor performance was not a bug, but inherent in the algorithms. We investigated restricted orthonormalisation, whereby only certain directions are projected out, but although this improved matters neither algorithm converged reliably.


next up previous contents
Next: 6. Final Thoughts Up: castep_performance_xt Previous: 4. Distributed Diagonaliser and   Contents
Sarfraz A Nadeem 2008-09-03