Introduction

The understanding to many problems in material science and nanotechnology involving metallic systems attracting interests today requires calculations to be done on a large scale. While significant advancements has been made in highly scalable and efficient computer codes for insulators and semi-conductors there is comparatively less work done for metallic systems. Although most of the algorithms that make up scalable computer codes for insulators and semiconductors are transferable to metallic systems, such as the efficient evaluation of the Hamiltonian and the use of compact and flexible basis sets, it is not possible to achieve the same scalability for calculations on metals because the density matrix for these systems are generally long ranged, and the conventional linear scaling techniques for insulators that rely on the short range nature of the density matrix no longer applies. The main goal of the project is to take an existing linear scaling Density Functional Theory (DFT) code designed for insulators and semiconductors (CONQUEST[4]) and modify it so that it can be efficiently applied to metals.

CONQUEST is a suitable basis for this modification because a metallic density matrix solver that involves direct diagonalisation of the Hamiltonian using SCaLAPACK v1.7 is already in place. It is also a mature open source linear scaling code with spectacular scaling with respect to number of processors. CONQUEST uses either B-splines or Pseudo-atomic orbitals to generate a flexible set of functions (support functions) in which to expand the wavefunctions. With B-Spline basis it can achieve plane-wave accuracy with a small number of functions. This is important for efficient metallic calculations, because the cost of diagonalisation depends directly on the number of spanning functions. Further more, CONQUEST has already been used successfully for simulating insulators on HECToR with 4096 processors (supported under the UKCP grant, project e89).

Following changes must be applied to the code for efficient calculation on metallic systems:

The density matrix solver cannot be based on the existing linear scaling techniques for insulators and semiconductors, hence the proposal is to use a simple matrix diagonalisation on the Hamiltonian using tools from a mature parallel processing linear algebra library, such as ScaLAPACK. This is already implemented in CONQUEST.
Introduce efficient Brillouin zone integration (Methfessel-Paxton method[14]) to reduce the number of diagonalisations need for achieving self-consistency. This is particularly important for metals because of the partially occupied bands which produces a discontinuity in occupation function at Fermi energy--problematic for accurate numerical integrations in Brillouin zone and hence effecting self-consistency.
Introduce a more efficient technique for reaching self-consistency in large systems. As the simulated system gets larger, a phenomenon called charge-sloshing which causes the inout and output electron density to oscillate and never reach self-consistency becomes a frequent occurrence for metals. This has been successfully tackled by introducing Kerker preconditioning and Wave-dependent metric in the ab initio code VASP[12] and the same techniques are to be added to CONQUEST.
Metallic systems in general have sharp and rapidly varying band-structure, and hence requires a high density of $\vec{k}$ points for accurate Brillouin zone integration. And each $\vec{k}$ point requires a separate diagonalisation. Further efficiency in calculation can be achieved if diagonalisation on several $\vec{k}$ points can be processed in parallel.
The metallic calculations on CONQUEST need to be profiled to spot potential bottlenecks, and optimal input parameters need to be found.

With these improvements and the already excellent scaling properties, CONQUEST can become a valuable tool for allowing large scale ab initio electronic structure simulation on metallic materials on HECToR.

This report describes the work done for implementing the above mentioned modifications and the profiling of the code. Section 2 describes the work done on implementation of Kerker preconditioning and wave-dependent metric; section 3 describes the implementation of Methfessel-Paxton Brillouin zone integration technique, in which we also discovered and solved a technical complication that was previously unknown in the field; section 4 gives the results from profiling CONQUEST, and confirms diagonalisation is indeed the bottleneck for the calculation; section 5 describes the work done on $\vec{k}$ parallelisation; and finally a prove of a theorem central to the solution for reliable implementation of the Methfessel-Paxton approximation is included in appendix A.

Before we move onto the first topic, we will briefly describe the physical system used for the various tests performed in this report. Bulk aluminium will be used through out (except in cases where a vacancy is introduced to break symmetry). Aluminium was chosen because it is near the top of the periodic table and therefore one does not have to worry about relativistic or semi-core corrections to the pseudopotential. At the same time it still has enough screening from core electrons so the pseudopotential will not be very complicated. The purpose of the test is to demonstrate the efficiency of the code, complications in the pseudopotential add an unnecessary overhead. The exchange-correlation functional used for our tests will be the local density approximation (LDA). This is justified because firstly it was reported by Gaudoin et al.[6,7] that the LDA approximation is adequate for bulk aluminium calculations, and secondly the focus of our calculations again is on the efficiency of the code, not on the accuracy of the band structure. The aluminium pseudo-potential is calculated using OPIUM[1] according to the method of Rappe et. al.[17,18]. This method generally gives a softer pseudopotential than a standard Troullier-Martins method[19]. Two and one electrons are taken as valence electrons for aluminium and partial-core (also known as non-linear core) or relativistic corrections are not included. The basis set used was double-zeta-polarisation (DZP) numerical pseudo-atomic-orbital (PAO) basis. The basis was generated from SIESTA[2] with energy shift of 100 meV. For the the current study we used just one support function per basis. The optimum lattice constant was found by calculating the ground-state energies for the system (using 4 Al atoms unit cell) with the lattice parameter varying from 3.50-4.40Å, and then finding the minimum by interpolating the results. We found for the above mentioned set-up the optimum lattice parameter for the LDA calculation to be Å. This can be compared to the result of Gaudoin et al. 3.960Å and experimental value of 4.022Å. It was also found that a real-space mesh with 32 grid points in each direction, and a reciprocal space mesh with 25 $\vec{k}$ -points in each direction (generated using Monkhorst-Park[15] method) is sufficient for accuracy up to $10^{-6}$ Ha with smearing temperature of 0.001Ha ( $\approx 300$ K). For unit cells of different sizes, we scaled the grid points accordingly, this means we need 64 real space points and 13 $\vec{k}$ points in each direction for a 32 atoms cell, and 128 real space points and 7 $\vec{k}$ points in each direction for 108 atoms cell. The calculations mentioned in the report will assume these parameters unless otherwise stated.

Lianheng Tong 2011-03-02