Implementation of established algorithms to extend HELIUM
(WP5: Extending HELIUM to treat many-electron atoms)

Michael A. Lysaght and Laura R. Moore
Department of Applied Mathematics and Theoretical Physics
Queen’s University Belfast

March 31, 2011

Abstract

The intrinsic time-scale of correlated electron motion in matter is the attosecond time-scale (1 attosecond = 10-18s) and just as femtosecond (1 femtosecond = 10-15s) laser technology offers us the means of accessing real-time nuclear dynamics, attosecond light pulse technology now promises us the revolutionary ability to measure and control the correlated motion of electrons in atomic systems on their own temporal scale. Recent ground-breaking ultrafast measurements demonstrate that a bottleneck in our understanding of correlated electron dynamics in atoms and molecules is the current state of theory. Correlated electron motion by its very definition is a quantum many-body process and so theoretical methods must go beyond the widely used single-particle picture so far employed to good effect to describe ultrafast single electron processes. Currently, the most sophisticated method that goes beyond the single particle model is that underlying the HELIUM code. Inspired by the success of HELIUM at describing the correlated motion of electrons within two-electron systems, the purpose of this dCSE project is to extend the capability of HELIUM to allow it to accurately describe the correlated electron dynamics occurring within complex atoms exposed to intense short-pulse light. This paper describes the underlying numerical methods and algorithms implemented to extend HELIUM into a powerful new Fortran 90/MPI code called RMT (R-Matrix incorporating Time) which is unique in its capability to describe the many-electron response of a general atom to intense short-pulse laser light. Illustrative results for single ionization of neon exposed to a combination of sub-femtosecond XUV and 800 nm Ti:sapphire laser light fields, along with a discussion of the performance of the new RMT code on HECToR, are included.

Contents

1 Introduction
 1.1 The purpose of the HELIUM dCSE project (WP5)
 1.2 Major outcomes of the project
2 Extending HELIUM to treat many-electron atoms: the RMT method
 2.1 The TDSE for a many-electron atom exposed to a light field
 2.2 The R-matrix division-of-space concept
 2.3 The mathematics underlying TD-OUTER
 2.4 Software description: The TD-OUTER code
 2.5 The mathematics underlying TD-RA
 2.6 Software description: The TD-RA program
 2.7 Interfacing (Integration) of TD-RA and TD-OUTER: the RMT program
 2.8 Demonstration results from the single core RMT code
3 Parallelization of the RMT code
 3.1 Implementation of an MPI version of the RMT code (region II)
 3.2 Implementation of an MPI version of the RMT code (region I)
 3.3 MPI communication between TD-RA and TD-OUTER
4 Porting RMT to HECToR
 4.1 Analysis of performance of RMT on the XT6
 4.2 First results produced by RMT on HECToR
5 Conclusion and discussion
6 Acknowledgements
7 Appendix
References

1 Introduction

Over the past 10 years there has developed a crucial need to solve directly and accurately the Time-Dependent Schrödinger Equation (TDSE) describing the detailed response of many-electron atoms and molecules to short, intense pulses of light. Such calculations are vital to get the best out of Attosecond Science [12] (1 atto-second (atto-s) is 10-18 seconds), the revolutionary goal of which is to track and control the correlated motion of electrons within matter on their own temporal scale. The advent of Attosecond Science is bringing about a revolution in both laser and electron collision physics. Experiments on the Attosecond Time-Scale (ATS) are in progress in leading laboratories in the UK [34]; in wider Europe [567]; in North America [891011]; in China [12] and in Japan [1314]. It is expected that near-future advances in Attosecond Light Pulse (ALP) technology [2] will have profound implications for fields as diverse as bioenergetics [15], molecular electronics [16] and even the treatment of cancer [17].

Many of the technological breakthroughs in Attosecond Science over the last decade have come about as a result of advances in theory [18] which now allow for a sound qualitative understanding of the Single Active Electron (SAE) response of an atom or molecule to intense femtosecond (1 femto-second (fs) is 10-15 seconds) laser pulses at short and long wavelengths. Indeed, in the long wavelength regime, advances in Ti:Sapphire laser technology [19] combined with an improved understanding of SAE phenomena such as High Harmonic Generation (HHG) [20] have resulted in the recent generation of extreme ultraviolet (XUV) light pulses that last a mere ~ 100 attoseconds in duration [21].

While SAE-based theories have been employed to good effect over the last decade, ALP technology is now demonstrating the revolutionary potential to induce and track ultra-fast many-electron dynamics within atoms and molecules [57]. There is therefore huge importance in the development of tractable methods that are able to treat the correlated motion of electrons in complex atoms and molecules exposed to intense (ultra)short light pulses. Currently, the most sophisticated code for describing correlated electron dynamics within an atomic system is the HELIUM code [2425]. HELIUM is a pioneering code in the area of Attosecond Science. Not only has it made possible important scientific discovery (afterwards borne out by laboratory experiment) [25] but it has, in doing so, explored the feasibility of propagating accurate solutions of a multi-dimensional TDSE. The code is specifically constructed to describe both single- and double-electron ionization of a two-electron atom or ion exposed to intense laser light. The code’s continuing success undoubtedly lies in the fact that, from the start, it was designed around MPP architectures [24] (recognized as being absolutely essential to progress in this area of research). The method kernel to the code is a high-order Arnoldi time-propagator [2223] combined with Finite Difference (FD) grid methods. By choosing an appropriate distribution over processors HELIUM involves only nearest neighbour communications and this has enabled it to scale highly effectively to more than 16,000 cores on HECToR.

It is this established efficiency, along with the experimental interest in strongly time-dependent atomic and molecular systems, that has inspired the designers of HELIUM to exploit its underlying numerical methods to treat complex many-electron atoms exposed to intense (ultra)short light pulses. In particular, the underlying FD technology of the HELIUM code appears to be well suited for the accurate and efficient description of one or two electrons ejected from a many-electron atom or molecule during the single or double ionization process. With this capability in mind, the designers of the HELIUM code have recently demonstrated the potential to carry over the FD technology of HELIUM into an accurate description of a general many-electron atom exposed to intense light pulses. This carry over has been shown to be possible by invoking the powerful division-of-space concept [26], central to the highly successful R-matrix theory of atomic and molecular processes [27]. The division-of-space concept, whereby the position space occupied by the atomic electrons is divided into two separate regions, not only makes possible the integration of FD technology into the description of the ultrafast ionization of many-electron atoms, but also allows an efficient way of limiting the many-electron representation of an atom to the only spatial region where it is absolutely necessary, i.e., a small region close to the nucleus. In this finite region it has been well-established that the R-matrix basis set functions offer an accurate and tractable means of representing the many-electron wave function [27]. Indeed, the combination of the R-matrix basis set with FD grid techniques brought together with the powerful Arnoldi propagator of the HELIUM code provides the potential to efficiently and accurately describe the full response of a many-electron atomic system to intense (ultra)short pulse light which has so far proven beyond the capability of all existing theoretical methods.

With this goal in mind, in early 2009 a five-year EPSRC Software Development Grant (the UK-RAMP Project) was awarded to four collaborating institutions: Queen’s University Belfast (QUB), University College London (UCL), the Open University and STFC CSE, to build upon expertise in all the various areas of ab initio theoretical electron- atom and molecule scattering and both time-independent and time-dependent laser-atom (molecule) interactions. A major strand of the UK-RAMP Project is the carry-over of HELIUM code technology to enable the accurate description of the full response of atoms and molecules to intense (ultra)short laser pulses. The first major and vital algorithmic advance required in the pursuit of this goal is the construction of an ab initio code that combines HELIUM code and R-matrix technology to describe the single-electron ionization response of a general many-electron atom exposed to intense (ultra)short light pulses.

1.1 The purpose of the HELIUM dCSE project (WP5)

The purpose of this dCSE project is to construct a unique application code that will allow for the accurate and efficient description of the single ionization response of a general many-electron atom exposed to an intense (ultra)short laser light pulse. This project therefore forms the first vital advance in the pursuit of codes that are capable of describing the multiple ionization of atoms as well as the single ionization of molecules exposed to intense light pulses as set out in the UK-RAMP Project. As a means of making this advance, the project significantly extends on the recent work carried out on interfacing FD methods with basis set methods for the description of the one-electron hydrogen atom [26]. In extending the method into one that can accurately and efficiently describe a general many-electron atom exposed to strong laser fields, three main tasks needed to be accomplished (over a 24 month period) which all required, as a starting point, significant serial code construction:

  1. Firstly, the new code needs to be able to describe the many-electron structure of the complex atom in the finite region close to the nucleus of the atom (often referred to as the “inner-region” in the R-matrix literature). To this end, the project has made use of a well-established and highly accurate R-matrix suite of serial codes known as RMATRX2 [28]. The RMATRX2 codes form part of PRMAT [29] which is one of the application packages required to be provided on HECToR [30]. Using basis set techniques, the RMATRX2 codes provide eigenenergies and eigenvectors for a general many-electron atom within the inner-region for a range of angular momenta, along with reduced dipole matrix elements between all the eigenstates. This time-independent structure data can therefore be used as ’starter’ data input for the new time dependent many-electron code. As an implementation strategy, a stand-alone code (referred to as TD-RA throughout the project) has been constructed which can provide an accurate description of the time-dependent response of a general many-electron atom to an external laser field within the finite inner-region. Nine months were dedicated to this task
  2. Secondly, the new code needs to be able to describe the motion of the ejected (ionized) electron in the much larger “outer-region” beyond the finite inner-region close to the nucleus. This electron moves in the presence of the external laser field and the residual many-electron ion. As already mentioned, FD grid techniques appear to be the most suitable numerical method for representing the wave function in this region. To this end, the project has built on top of a well-established and accurate single-electron version of the time-dependent HELIUM code, known as HYDRO, to create a new stand-alone code (referred to throughout as TD-OUTER). In order to describe the electron moving in the presence of the residual ion, HYDRO needed to be significantly adapted so as to include long-range multipole potential terms. These terms also form part of the starter data calculated by the RMATRX2 codes. The effect of the external laser field is not only felt by the ejected electron, but is also felt by the residual ion and so extra laser potential terms also needed to be added to the HYDRO code to create a complete version of TD-OUTER. Six months were dedicated to this task
  3. Finally, the two codes needed to be integrated into a single code, now known as the RMT (R-matrix Incorporating Time) code. In constructing the full version of the RMT code, the project has focused heavily on implementing Arnoldi time-propagator techniques to enable the code to harness the power of MPP architectures. This work has required significant extensions to the algorithms used to time-propagate the wave function in the prototype single-electron method [26] where in that method a Taylor-series time-propagator was used to propagate the wave function in the inner- and outer-regions. Central to the integration of the TD-RA and TD-OUTER codes has been the implementation of an Arnoldi-based propagator that allows for wave function flux to pass between the R-matrix inner- and outer-regions. As part of this final task, the code was also parallelized using the Message Passing Interface, so that it could be ported to the HECToR Cray XT6 machine. By running the new RMT code on HECToR, the code was able to calculate the response of a neon atom to the combination of 800 nm and attosecond XUV light pulses, where multiple states of the residual ion were included in the description of the laser-atom interaction. Nine months were dedicated to this task.

All of the tasks were completed within the specified time-frames.

1.2 Major outcomes of the project

This dCSE project has resulted in the construction of a unique application code, now known as the RMT code. The RMT code is a general-purpose ab initio computer code for accurately describing the many-electron response leading to single ionization of many-electron atoms and atomic ions exposed to intense laser light pulses. The ab initio nature of the code allows for not only the description and understanding of future experimental measurements, but will also enable the prediction of physical phenomena occurring in intense laser-atom interactions and consequently allow for scientific discovery that will point the way for future experimental investigations at laser labs and large scale facilities over the next decade, at least.

One of the unique features of the RMT code is that it has been designed specifically to be applicable to atoms exposed to wavelengths ranging right across the light spectrum, from the extreme ultraviolet (XUV) photon frequencies being produced at 4th Generation light sources such as FLASH, the future European XFEL facility and the LCLS in the USA, to the infrared (IR) frequencies being produced at Ti:sapphire-based laser labs throughout the UK (e.g., Artemis at the STFC’s Central Laser Facility) and wider Europe (e.g.,‘Laserlab-Europe’). Not only does the RMT code have flexibility over wavelength, but because of its genuinely time-dependent nature it also has the flexibility to describe the response of atomic systems to both short- and long-duration light pulses. Moreover, because of the explicit representation of the light field, the code will also be able to support the wide-range of experiments investigating the effects of temporal pulse-shaping on the response of atoms to laser light, an important step towards future control mechanisms in fields such as photochemistry and molecular electronics.

The ability of the RMT code to describe the dynamics of general many-electron atoms on the attosecond time scale will also be of immense benefit to all UK research groups developing and applying Attosecond Technology (STFC’s CLF, Imperial College London). The near-future promise of high-intensity attosecond light pulses holds out the prospect for the real-time observation and control of charge transfer in molecular systems which is expected to have a strong impact on the future of molecular electronics [15] and possibly cancer research [17]. While the RMT code described here will be capable of providing deep insight into the dynamics of electrons in many-electron atoms exposed to ultrashort pulses, the exciting and genuine potential to adapt the method to describe many-electron dynamics within molecular systems will rely heavily on the advances that were made during this project.

2 Extending HELIUM to treat many-electron atoms: the RMT method

As a necessity, the starting point for describing the ultra-fast dynamics of electrons in an atom exposed to an intense (ultra)short light pulse is the Time-dependent Schrödinger Equation (TDSE).

2.1 The TDSE for a many-electron atom exposed to a light field

Neglecting relativistic effects, the behaviour of the N+1 electron atomic system in the presence of the laser field is governed by the TDSE

i ∂-Ψ(XN +1, t) = HN +1(t)Ψ(XN +1,t),               (1)
 ∂t
where the Hamiltonian is
        N∑ +1(              N∑+1   )         N∑+1
HN +1 =     ( - 1∇2 - Z- +      -1)  + E(t)⋅    ri.        (2)
         i=1     2  i  ri   i>j=1 rij           i=1
We have taken the origin of the coordinates to be in the nucleus, which we assume has infinite mass. We have rij = |ri - rj| where ri and rj are the vector coordinates of the ith and jth electrons and we have written XN+1 x1,x2,...,xN+1 where xi riσi. Z indicates the nuclear charge and E(t) is the time-dependent light field.

2.2 The R-matrix division-of-space concept


PIC

Figure 1: The R-matrix division-of-space concept. In region I an eigenstate expansion representation of the wavefunction is chosen, while in region II a grid-based representation is considered. The boundary of region I is at r = b and the outer boundary of region II is at r = R. The radial variable of the (N+1)-th electron is denoted as r.

In the R-matrix method [27] the position space occupied by the electrons is divided into two regions: An inner region (region I) surrounding the nucleus where a many-electron wave function is constructed and many-electron atom-laser Hamiltonian matrix elements are calculated explicitly, and an outer region (region II), chosen such that only one electron (or at most two) is present and the electron there, besides experiencing the laser field directly, is aware of the remainder of the atomic system only via long-range multi-pole interactions. Traditionally, R-matrix theory is a theory where time is not explicitly involved in the study of the collision or photoionisation processes.

Figure 1 displays the division of many-electron position space which underlies R-matrix theory. In region I the time-dependent wavefunction is expanded over the R-matrix eigenstates of the field-free Hamiltonian. Region I is defined by all N + 1 electrons of the system having a radial co-ordinate rq b, q = 1,...,N + 1. Region II is defined by N electrons having a radial coordinate rq b, q = 1,...,N and one electron having its radial coordinate rN+1 > b. In region II the time-dependent wavefunction is represented over a finite-difference grid by its values at equidistant grid points rN+1(i) = ih,i = ib,ib + 1,...,iR.

2.3 The mathematics underlying TD-OUTER

Much of the power of the division-of-space concept is derived from limiting the many-electron representation of the atom to only the spatial region where it is absolutely necessary, i.e., region I, close to the nucleus of the atom. This means that the number of electrons in region II is limited to just one and hence the TDSE in region II is, for each state of the residual-ion, reduced in dimensionality to at most three, thus simplifying the computational problem considerably.

In region II, the time-dependent N+1 electron wave function can therefore be expanded as Ψ(XN+1,t) = pnpΦp1
rFp(r,t), where the Fp functions are single-electron functions describing the radial motion of the ejected electron (in the pth-channel), and where the radial variable of the (N + 1)-th electron is denoted as rN+1 = r. The Φp are channel functions formed by coupling the target states of the residual ionic system Φp(XN ) with angular and spin parts of the ejected electron wave function. Note that the time dependence of the wavefunction is contained within the radial functions Fp(r,t).

By projecting the known target functions Φp onto the TDSE and integrating over all spatial variables except the radial coordinate of the ejected electron, the following set of coupled homogeneous partial differential equations (PDEs) for the radial channel functions Fp(r,t) is obtained:

                            ∑  [                              ]
i ∂-Fp(r,t) = Hti (r)Fp(r,t) +   WE   ′(r)+ WD   ′(t) + WP  ′(r,t)  Fp′(r,t) (3)
 ∂t            p            p′     pp         pp         pp

Eq.(3) is the evolution equation for the wave function in region II. In eq.(3), WE is referred to as the long range potential in the R-matrix literature and arises from the electron-electron and electron-nuclear potential terms in the Hamiltonian. WD arises from the interaction of the light field with the residual N-electron ion. The WP potential arises from the interaction of the light field with the ejected electron. Hti is the time-independent part of the Hamiltonian in region II.

2.4 Software description: The TD-OUTER code

This section describes a stand-alone code, TD-OUTER, that was constructed to solve eq.(3).

Software requirements:

Design strategy:

The TD-OUTER code was constructed on top of a well-tested one-electron version of the HELIUM code known as HYDRO. The HYDRO program uses the highly accurate and efficient Arnoldi propagator and FD techniques of HELIUM to propagate a single electron wave function forward in time. HYDRO has been written in Fortran 90 and has been parallelized using a similar strategy to that implemented in HELIUM using the Message Passing Interface (MPI). Since TD-OUTER describes the ejected electron in the presence of a many-electron atom it requires many-electron atomic structure data to construct the WE,WD and WP potentials appearing eq. (3). This data is calculated using the RMATRX2 suite of programs [28] which writes the atomic data to two files, H and d. All modifications to the HYDRO program at this stage of implementation were made using the single core version of the HYDRO program.

Definition of tasks:

In order to construct the TD-OUTER program two modifications to HYDRO were required:

Software implementation phase:

2.5 The mathematics underlying TD-RA

In region I, the time-dependent N+1 electron wave function Ψ(XN+1,t) is represented over an R-matrix eigenbasis ψk(XN+1) as Ψ(XN+1,t) = kCk(t)ψk(XN+1) where rN+1 b and where Ck(t) are time-dependent coefficients. The TDSE in the inner region is given by eq.(1). However, in region I, the Hamiltonian HN+1 is not Hermitian owing to the presence of the kinetic energy -1
2i2 term in eq.(2) and the finite value of the wave function on the inner-region boundary. Consequently a Bloch operator LN+1 = 12 i=1N+1δ(ri -b)ddri is introduced which is such that HN+1 + LN+1 is Hermitian in region I. The TDSE in region I may then be written as

 ∂
i--ΨI (XN +1,t) = HI (t)ΨI (XN +1,t)- LN+1 Ψ (XN +1,t),       (5)
 ∂t

where ΨI(XN+1,t) is the wave function defined over region I in figure 1. By projecting eq. (5) onto the eigenstates ψk(XN+1) we obtain the evolution equations for the time-dependent coefficients Ck(t):

 d           ∑
--Ck (t) = - i   HIkk′(t)Ck′(t)+ iSk(t)               (6)
dt           k′
where
        1 ∑     ∂F  (r,t)
Sk (t) = --   ωpk---p----|r=b,                    (7)
        2  p       ∂r
and where the ωpk are surface amplitudes written to the H file by RMATRX2. The solution of eq.(6) can be written in terms of so-called ϕ functions [31] (and in matrix notation) as
                        ∑∞
C(t + δt) = e-iδtHIC (t)+    δtjϕj(- iδtHI )Uj(t),          (8)
                        j=1

where

                        dj-1
U0(t) = C(t),  Uj(t) = i-j-1S (t).                 (9)
                       dt
The ϕj(-iδtHI) functions are related to the exponentiation of the Hamiltonian matrix, HI, and are due to the time-dependent inhomogeneity in eq.(6). For a scalar argument z the ϕj(z) functions are defined by the integral representation
               ∫ 1
ϕj(z) = --1----   e(1-θ)zθj-1dθ,  j ≥ 1.             (10)
        (j - 1)! 0

At this point it should be emphasized that eq.(6) is fundamental to the RMT method in two ways. Firstly, the inhomogeneous Sk term on the right-hand-side compensates for the Bloch term introduced to make HI Hermitian. Note that it makes a contribution only at r = b and brings into play there Ψ(XN+1,t), a wave function form which has been defined throughout both regions. This term is central to any time propagation scheme in region I because it connects the wave function form ΨI(XN+1,t), which is many-electron in nature, with a wave function form that, at r = b, represents a single electron and which, numerically, is obtained from region II. Secondly, eq.(6) allows for two different numerical methods to be brought together, one (basis set) most appropriate to the many-electron finite region I and the other (finite difference), most appropriate to the one-electron region II. In this way eq.(6) is fundamental to future plans to adapt the method to many-electron molecules.

2.6 Software description: The TD-RA program

Software requirements:

Design strategy:

Eq.(6) takes the form of an inhomogeneous TDSE which arises in many areas of quantum dynamics. However, it is only recently that attempts have been made at using high-order explicit methods to solve equations of the type given by eq.(6). One such high-order propagator can be devised by approximating the formal solution explicitly by a Taylor expansion [32]. This is the method used for the single-electron implementation of the current method [26]. The approach taken in TD-RA is to compute the values of the Ck(t + δt) coefficients in eq.(8) using an adapted “inhomogeneous” Arnoldi method that in many ways resembles the standard Arnoldi method used for solving the homogeneous TDSE. The full implementation of the adapted Arnoldi algorithm will be described later in section 2.7. In this section a method is first implemented which focuses on propagating the Ck coefficients without the driving inhomogeneous terms that will be calculated in region II. In this way, the level of accuracy of the standard homogeneous Arnoldi method combined with the R-matrix basis set technology is first established.

Definition of tasks:

The construction of the TD-RA program requires the implementation of a single task:

Implementation phase:

2.7 Interfacing (Integration) of TD-RA and TD-OUTER: the RMT program

The construction of the RMT program requires the implementation of three tasks:

Implementation phase:

By this stage the single core version of the RMT program was fully constructed. The many-electron wave function is known at time t + δt throughout regions I and II and further propagation in time can progress by repeating, for successive time steps δt, the procedures described in the previous sections.

2.8 Demonstration results from the single core RMT code


Table 1: Single-ionisation rates of helium given in atomic units obtained by RMT, by HELIUM and by the three-state approximation in the R-matrix Floquet (RMF) approach at a laser wavelength of 248 nm for given values of the peak laser intensity.





Intensity

(1014 W/cm2)

RMT

HELIUM

RMF





0.5

1.18 × 10-8

1.25 × 10-8

1.13 × 10-8

3.2

1.27 × 10-5

1.21 × 10-5

1.23 × 10-5

4.0

3.23 × 10-5

3.15 × 10-5

3.02 × 10-5






Once the single core version of the RMT program was fully constructed, its accuracy was verified by using it to investigate electron wavepackets ejected from Ne (which has 10 electrons) irradiated by an XUV laser pulse. All calculations at this stage were performed on local machines at QUB. Results were compared to those calculated using the TD-RA program in which there is no division of position space occurring. The parameters used for these first RMT test calculations can be found in the ‘Ne set A’ list in the appendix of the report. The TD-RA calculations used the same parameters except that, because there is no region II included, the boundary of region I was increased to 100 a.u. and the number of B-splines per angular momentum, l, increased to 100. The partial continuum wave functions of Ne calculated after the end of the laser pulse using the RMT method were compared to those calculated using the TD-RA method. The same test calculations were carried out using ’Ne set B’ as input so that the continuum wave functions coupled to an excited residual ion state could be compared. This comparison of time-dependent wave functions obtained using the mixed basis/finite-difference RMT method with those obtained using a pure basis set method represents one of the most stringent tests of accuracy for the new RMT method. The excellent agreement between the results of the two independent methods helps verify the accuracy and reliability of the RMT program.


PIC


Figure 3: Comparison of ionisation rates for Ne irradiated by 248-nm laser light as a function of intensity. Rates from the current RMT approach (solid red circles) are compared to those obtained using the R-matrix Floquet approach (solid black line).

As a second means of demonstrating the accuracy of the RMT method, single-electron ionization rates were calculated for He and Ne irradiated by a laser pulse with a central wavelength of 248 nm (corresponding to the fundamental wavelength of the KrF laser). These rates were compared with rates obtained using the time-independent R-matrix Floquet (RMF) method and, in the case of He, also with those obtained using the HELIUM code. For the He calculations the parameter set ’He’ was used and for the Ne calculations ’Ne set A’ was again used. These parameter sets can be found in the appendix. Table 1 shows a comparison of He single-electron ionization rates for three intensities. Away from resonances, the ionization rates calculated by RMT agree well (to within 10%) with those calculated by HELIUM [35] and by the RMF method [36]. Figure 3 shows the comparison of Ne single-electron ionization rates. Agreement between the two sets of results is very good, typically within 10% of each other away from resonance.

3 Parallelization of the RMT code

This section describes the implementation of a parallel RMT program employing MPI together with tuned BLAS and LAPACK library routines. In the parallel RMT program processes are assigned to two functional groups: 1) the TD-RA group of processes calculates the wave function in region I at each time-step, and 2) the TD-OUTER group of processes calculates the wave function in region II at each time-step. Two-way exchange of data between the two groups occurs via synchronous point-to-point communication between the master cores in each of the groups. The approach scales well provided computational load is adequately balanced across the two groups. Results are stored to disk periodically for analysis and to provide a restart procedure.

3.1 Implementation of an MPI version of the RMT code (region II)

The TD-OUTER group is parallelized by assigning a number of grid points to each core in the group, so that the first core handles the first XLast grid points, the second core the next XLast grid points and so on. Each compute core in the TD-OUTER group is thus responsible for a wavefunction array of dimension np × XLast where np is the number of channels retained in the calculation.

Because the derivative operators in Hti have been replaced with 5-point FD operators, communication between cores in calculating HtiF is limited to nearest neighbour communication only.

3.2 Implementation of an MPI version of the RMT code (region I)

Parallelization within the TD-RA group has focused on the two most CPU-intensive routines of the serial RMT code. These are the Arn-Matrix-Vector-Multiply subroutine of the inhomogeneous Arnoldi algorithm and the WV-Matrix-Vector-Multiply subroutine called during the evaluation of the wave function on the FD points within region I.

Parallel-Arn-Matrix-Vector-Multiply

Unlike the FD Hamiltonian in region II, the elements of the block-tridiagonal Hamiltonian matrix, HI, in region I are not calculated on the fly which means that options for parallelizing the matrix-vector kernel of the inhomogeneous Arnoldi algorithm are more restrictive. Instead all matrix elements are read from files written to disk by RMATRX2. At start-up, the master core in the TD-RA functional group is the only core that reads the files written by RMATRX2. Once all the HI matrix elements have been read, the strategy used here is to map the block rows, along with the corresponding segment of the Uj vectors being multiplied into, onto a group of cores that are from here on referred to as Block Master Cores (BMCs). This already places constraints on the minimum number of cores needed to run the program as at least one core should be available for each LSπ symmetry block. Once distributed, two dense dipole matrix blocks of size mnp1symi × mnp1symj and a 1D array of eigenvalues of length mnp1symi sit on each of the BMCs. At this coarse level of distribution, the Parallel-Arn-Parallel-Matrix-Vector subroutine can already be implemented using BLAS library routines. Load-balancing within the TD-RA group is accomplished by ensuring that each block row is assigned the same number of cores and by forcing mnp1symi = mnp1symj by padding the symmetry blocks with zeros (for most input data sets mnp1symi mnp1symj). Increasing the number of cores within the TD-RA group results in multiple cores being assigned to each block row of HI. Due to the off-diagonal block structure of HI two-way exchange of arrays of length mnp1symi occurs between each BMC during each call to Parallel-Arn-Matrix-Vector-Multiply.

Parallel-WV-Matrix-Vector-Multiply

At start-up the eigenvectors, V k, associated with each eigenstate, k, of HI are distributed to the BMCs. As soon as the compute cores in the TD-RA group evaluate their updated segment of Ck(t + δt), an MPI GATHERV is called so that the Ck(t + δt) are collected onto the BMCs. The BMCs subsequently call the Parallel-WV-Matrix-Vector-Multiply subroutine to calculate the Fp value required at each FD point within region I. The wave function values are then gathered onto the master core in the TD-RA group to be sent to the master core in the TD-OUTER group.

3.3 MPI communication between TD-RA and TD-OUTER

During propagation, communication between TD-RA and TD-OUTER occurs between the master core in TD-RA (MCR1) and the master core in TD-OUTER (MCR2). The communication is based on synchronous MPI SEND/RECV calls. The communication calls occur in the following sequence:

The above sequence of calls is repeated for each time-step of the propagation.

Load-balance between TD-RA and TD-OUTER

For efficient scaling of the RMT code, computational load needs to be well balanced across the two functional groups TD-RA and TD-OUTER.

To minimize idle time in TD-RA, MCR2 needs to send the Uj vectors to MCR1 in synchronization with the calculation of the ϕj vectors – As soon as ϕj-1 is calculated, MCR1 is waiting to receive Uj from MCR2 in order to proceed with the calculation of ϕj.

For a given problem size, there are only two adjustable parameters in the RMT code that can influence load balance between the TD-RA and TD-OUTER groups. The first is the number of cores assigned to the TDRA group, nctdra. The second is the number of FD points allocated to each core in TD-OUTER, XLast. If the values of nctdra and XLast are chosen too large the amount of time spent idle on each core within the TD-RA functional group will increase. If nctdra or XLast are chosen too small the amount of idle time within the TD-OUTER functional group will increase. The optimal values for nctdra and XLast can be found by using timing routines in TD-RA and TD-OUTER along with using the CRAYPAT profiling tool.

4 Porting RMT to HECToR

The RMT code was ported to HECToR in January 2011. An RMATRX2 suite of codes was also ported to HECToR at the same time so that relevant ‘starter’ data could be generated. The results shown here were produced by compiling the RMT code with the PGI Fortan 90 compiler. The CRAYPAT profiler was used to analyze the initial performance of the RMT code on HECToR.

4.1 Analysis of performance of RMT on the XT6

Comparison with single core version of RMT:

The first runs of the parallel RMT code used the ’Ne set A’ and ’Ne set B’ parameter lists as input. Results were shown to be in excellent agreement with the results produced employing the serial version of the RMT code, run both on the QUB machines and a single core of the XT6. The output of the parallel RMT code was found to agree with the output of the serial version of the code to the level of machine precision.

Scalability:

To test the scalability of the RMT code, several test runs were performed on various numbers of cores on the HECToR XT6. The maximum number of cores allocated to the functional group, TD-OUTER, during this initial test phase was 384. The maximum number of cores allocated to the functional group, TD-RA, was 48, giving 432 as the maximum total number of cores allocated to a run of the RMT code. All of the scaling tests were carried out using the ’Ne set C’ parameter list in the appendix which to date represents the largest problem size used as input to the RMT program. With this set of parameters, there are 17 residual ion states (resulting in 687 coupled channels) included in the calculation with a HI dimension of ~35000 × 35000.

Load-balancing:

By carrying out a load-balancing analysis using timing routines and CRAYPAT, an optimum value of XLast can be found for a given number of cores allocated to the TD-RA functional group. Using ’Ne set B’ input data, the optimal value of XLast for 24 (48) cores allocated to TD-RA was found to be 144 (89).

4.2 First results produced by RMT on HECToR


PIC

Figure 5: 2D Momentum distributions of electrons ejected from Ne irradiated by an attosecond XUV pulse in the presence of a time-delayed Ti:Sapphire 800 nm laser pulse. The XUV field is at its peak in (a) at 1.25 cycles into the IR field, and in (b) at 1.75 cycles into the IR field.

In order to demonstrate the capability of the RMT code, calculations were carried out to study the interaction of Ne with an XUV attosecond light pulse in the presence of a 800 nm Ti:Sapphire laser field. Such an interaction has recently been studied experimentally where complex many-electron effects within Ne are suspected to be the reason for an unexpected time delay between the escape of a 2p electron and a 2s electron. Using “attosecond streaking” methods, the experiment has measured this time delay to be 21 attoseconds. So far, no theoretical method has been able to provide a value for the time-delay that even comes close to matching the experimental value. Initial calculations of the time delay demonstrate that such precise phase-sensitive information can be calculated with the new RMT program.

In the RMT calculation the 800nm field is a 3 cycle pulse of peak intensity 1013 W/cm2. The 91 eV XUV pulse is a 14 cycle pulse also of peak intensity 1013 W/cm2. Figure 5 shows the 2D momentum distributions of ejected electron wave-packets from Ne for two different delays between the XUV and the IR fields. The XUV field is at its peak in figure 5(a) at 1.25 cycles into the IR field, and in figure5(b) at 1.75 cycles into the IR field. In (a) the IR field imparts, to the ionizing electron wavepacket, positive momentum in the z-direction, while in (b) it imparts a negative momentum in the z-direction. Consequently the kxkz momentum distribution is shifted upwards in (a) and downwards in (b) as compared to the distribution symmetric about the kz=0 axis obtained when there is no IR field present.

Both (a) and (b) figures show contributions to the momentum distribution from electrons liberated from the 2p subshell (outer circle) and from the 2s subshell (inner circle). It is worth noting that the due to the RMT code’s ability to explicitly include multiple residual ion states in the calculation, it is one of the few methods worldwide that is capable of describing the momentum distributions of the ejected 2p and 2s electrons in a single calculation. By performing further calculations with various delays between the fields it is observed that the inner and outer circles oscillate between their positions in (a) and (b). Analysis of these oscillations reveals an ultrashort time delay in photoemission from the two valence states (2p and 2s). It is the intention of the authors to present full details of these exciting results at the XXVII International Conference on Photonic Electronic and Atomic Collisions (ICPEAC) 2011 conference and subsequently to submit results for publication in a peer reviewed journal.

5 Conclusion and discussion

This report has described the construction of a powerful new ab initio RMT code that has the unique capability to accurately and efficiently describe the single ionization response of a general many-electron atom exposed to intense laser light pulses. The construction of the RMT code has been made possible by implementing the numerical methods and algorithms underlying two well-established ab initio codes, namely HELIUM and the RMATRX2 suite of codes. The RMT code has immense flexibility since it can not only describe the response of a general many-electron atom to intense laser light fields with wavelengths ranging throughout the spectrum, from the extreme ultraviolet to the infrared, but also to both short- and long-duration pulses of such light.

The construction of the RMT code constitutes a major algorithmic advance that represents a vital first step in the development of a method that is capable of accurately describing the single ionization response of a many-electron molecule to intense laser light and later, the double-electron ionization response of a many-electron atom exposed to such light.

The accuracy of the new code has been verified by using it to calculate single ionization rates for He and Ne irradiated by a KrF laser pulse. In the limit of many-cycle laser pulses, a regime in which R-matrix Floquet (RMF) theory is valid, results from the RMT code and RMF code have shown excellent agreement. In the case of He, RMT code results were also shown to be in excellent agreement with those produced by the HELIUM code.

A description of the parallelization of the RMT code and its porting to the HECToR Cray XT6 has been provided. Running the parallel RMT code on the XT6 machine has already revealed the code’s potential to accurately describe the effects of highly complex correlated electron dynamics on the ultrafast photoemission of electrons from many-electron atoms when exposed to a combination of XUV and 800 nm laser light.

The report also provides a performance and load-balancing analysis of the new code on the XT6. The RMT code has been shown to scale well, provided computational load is balanced between the TD-RA and TD-OUTER functional groups working on region I and region II of the electronic position space.

However, modifications to the code need to be made in order to maximize performance on the current generation of high-end computing machines such as the HECToR Cray XT6, typically containing many thousands of multi-core processors. Improvements to parallel scaling performance, single node efficiency and memory usage will enable very large laser-atom and laser-ion calculations to be addressed on these machines. Such large-scale calculations are essential to complement the sophisticated laboratory-based experiments being carried out around the world in the field of Attosecond Science. In particular, effective load balancing between the TD-RA and TD-OUTER functional groups will be paramount to the future use of the RMT method in the investigations of the double ionization of atoms to intense laser fields. The optimization of the RMT code for use on HECToR will be the focus of an upcoming 6-month dCSE project.

6 Acknowledgements

The authors would like to thank the following people for their contributions to this work:

Prof. K. T. Taylor, Queen’s University Belfast
Dr. J. S. Parker, Queen’s University Belfast
Dr. H. W. van der Hart, Queen’s University Belfast
Dr. L. A. A. Nikolopoulos, Dublin City University
Dr. M. Plummer, STFC Daresbury Laboratory

This project was funded for 24 months under the HECToR distributed CSE (dCSE) programme, which is provided through The Numerical Algorithms Group (NAG) Ltd. LRM was funded for the first 6 months of the project to work on the FD-OUTER program. MAL was funded for the remaining 18 months of the project.

7 Appendix







Input parameters
Atomic system




He Ne (set A)Ne (set B) Ne (set C)










region I





No. of B-splines per l 60 60 60 60
Radius, b, (a.u) 70 20 20 20
No. of ionic states 1 1 2 17
No. of total angular momentum (L) 16 16 10 24
Resulting no. of channels 16 31 29 687
Resulting Hamiltonian order ~900 ~2000 ~2000 ~35000





region II





R (a.u.) 1000 1000 1000 7200
FD grid spacing (a.u.) 0.2 0.2 0.2 0.2





Laser pulse





Wavelength (nm) 248 248 248 800+atto/XUV
No. of cycles 20 20 20 3





Time





δt (a.u.) 0.01 0.01 0.01 0.01






Table 3: Input parameters for RMT code. The He orbitals and Configuration Interaction (CI) basis were taken from [36]. The Ne orbitals and CI basis were taken from [37].

References

[1]   P. Corkum and F. Krausz Nat. Phys. 3, 381 (2007)

[2]   F. Krausz and M. Ivanov Rev. Mod. Phys. 81, 163 (2009)

[3]   S. Baker et al Science 312, 424 (2006)

[4]   L. E. Chipperfield et al Phys. Rev. Lett. 102, 063003 (2009)

[5]   M. Uiberacker et al Nature 446, 627 (2007)

[6]   E. Goulielmakis et al Nature 466, 739 (2010)

[7]   M. Schultze et al Science 328, 1658 (2010)

[8]   N. Dudovich et al Nature Physics 2, 781 (2006)

[9]   H. J. Wörner et al Nature 446, 604 (2010)

[10]   K. P. Singh et al Phys. Rev. Lett. 104, 023001 (2010)

[11]   H. Wang et al Phys. Rev. Lett. 105, 143002 (2010)

[12]   Y. Zeng et al Phys. Rev. Lett.103, 043904 (2010)

[13]   Y. Nabekawa et al Phys. Rev. Lett.102, 213904 (2009)

[14]   E. Takahashi et al Phys. Rev. Lett.104, 233901 (2010)

[15]   J. Vura-Weis et al Science 328, 1547 (2010)

[16]   M. Stockman et al Nature Photonics 1, 539 (2007)

[17]   H. B. Gray and J. Halpern Proc. Nat. Ac. Sci. 102, 3533 (2005)

[18]   P. Corkum Phys. Rev. Lett. 71, 1994 (1993)

[19]   T. Brabec and F. Krausz Rev. Mod. Phys 72, 545 (2000)

[20]   M. Lewenstein et al Phys. Rev. A 49, 2117 (1994)

[21]   G. Sansone et al Science 314, 443 (2006)

[22]   W.E. Arnoldi Quart. Appl. Math. 9, 17 (1951)

[23]   T.J. Park, J.C. Light J. Chem. Phys. 85, 5870 (1986)

[24]   E.S. Smyth, J.S. Parker, K.T. Taylor Comput. Phys. Commun. 114, 1 (1998)

[25]   J.S. Parker et al Phys. Rev. Lett. 96, 133001 (2006)

[26]   L.A.A. Nikolopoulos, J.S. Parker, K.T. Taylor Phys. Rev. A 78, 063420 (2008)

[27]   P.G. Burke, K.A. Berrington Atomic and Molecular Processes: An R-matrix Approach, (IOP Publishing, Bristol) (1993)

[28]   P.G. Burke, V.M. Burke, K.Dunseath J. Phys. B: At. Mol. Opt. Phys. 27, 5341 (1994)

[29]   A. G. Sunderland et al Comp. Phys. Commun. 145, 311 (2002)

[30]   A. G. Sunderland, C. J. Noble and M. Plummer dCSE Technical Report (2010)

[31]   B. Skaflestad, W.M. Wright Appl. Numer. Math. 59, 783 (2009)

[32]   M. Ndong, H. Tal-Ezer, R. Kosloff. C.P. Koch J. Chem. Phys. 130, 124108 (2009)

[33]   R. Sidje ACM Trans. on Math. Soft. 24, 130 (1998)

[34]   Y. Saad SIAM J. Numer. Anal. 29, 209 (1992)

[35]   J.S. Parker et al J. Phys. B: At. Mol. Opt. Phys. 33, L239 (2000)

[36]   D.H. Glass, P.G. Burke J. Phys. B: At. Mol. Opt. Phys. 32, 407 (1999)

[37]   P.G. Burke and K. T. Taylor J. Phys. B: At. Mol. Opt. Phys. 8, 2620 (1975)