Optimization of the MPI parallel RMT code for HECToR and likely successors
The objective of this Distributed Computational Science and Engineering (dCSE) project has been to develop a series of optimizations for efficient load balancing of the RMT code on HECToR, and to implement improvements in its propagation (numerical integration) algorithm. The RMT method (R-Matrix with time-dependence) is a new ab initio method for solving the time-dependent Schrüdinger Equation for multi-electron atomic systems in intense short laser pulses. RMT merges an outer region finite-difference model (the 2-electron HELIUM code) with a classic B-Spline R-Matrix basis set for the multi-electron inner region.
R-matrix methods successfully model multi-electron atom-laser interactions, and molecule-laser interactions, but only in the time-independent limit. HELIUM successfully models time-dependent atom-laser interactions, but is limited to 2-electron atoms. RMT removes both of these limitations. This is important because in the high-frequency XUV limit (now possible with free-electron lasers) inner-shell excitations (modelled well by R-matrix methods) can be expected to influence or dominate the interaction. Of equal importance to the success of RMT is the high efficiency and scalability on parallel computers of the HELIUM approach.
RMT enables theoretical analysis of recent experimental advances with a degree of reliability that would be impossible by competing methods. These experimental domains include time-resolved studies of ionization events in attosecond time-scales, studies of time-delays between the ejection of electrons in double-ionization, inner shell excitations and decays in complex atoms, intense-field atom-laser interactions in the XUV limit using the new free-electron x-ray lasers, and harmonic generation in atoms and molecules.
The key goals of the project are:
- To develop and test a series of optimizations for efficient load balancing of the RMT Inner Region with RMT Outer Region for execution on HECToR in the 100-10,000 core range.
- Algorithmic enhancements to the propagator in order to improve efficiency of the numerical integration in the limit of small spatial grid-point spacings.
The individual achievements of the project are summarised below:
- The major load-balancing optimization was a redesign of the inner/outer communications software, which divided the inner region(IR) into two independent sets of cores that could communicate independently with the outer region(OR). A case is demonstrated in which this increases the integration speed by a factor 1.7 over the original method. Two additional optimizations proved successful. The first moved significant computation from the OR to the IR (thereby reducing the information exchanged between the regions each time-step), and the 2nd assigned a single dedicated core to inter-region communication. Together these optimizations produced a 15-30% speed-up.
- One of the more welcome results of the project has been the success of a new form of finite-differencing (developed during this project for RMT) based on least-squares polynomial fits to the finite-difference wavefunction. The least-squares differentiation operators were shown to suppress the high frequency eigenvalues of the FD Hamiltonian by up to a factor of 4. A case is demonstrated in which this results in an 80% improvement in integration speed, while preserving the accuracy of the integration.
Please see PDF or HTML for a report which summarises this project.