An overview of the modifications made to SCATCI

The construction and diagonalization of both the $ N$- and ($ N$+1)-electron Hamiltonian in the R-matrix inner region occurs within the SCATCI program. Upgrade work on SCATCI has focused on parallelizing subroutines involved in both the construction and diagonalization of the ($ N$+1)-electron Hamiltonian, $ H^{N+1}$. For problems in which the order of $ H^{N+1}$ is large ($ >$ 100,000), over 90$ \%$ of UKRmol-in compute time is spent within subroutines associated with the diagonalization of $ H^{N+1}$. In choosing numerical methods to diagonalize $ H^{N+1}$, it is noted that the partitioned R-matrix method requires only $ \sim$ 5$ \%$ of eigenpairs from the diagonalization of $ H^{N+1}$ and also that, in the majority of cases, $ H^{N+1}$ is extremely sparse ($ \sim$ 99$ \%$ sparsity). For these reasons, iterative Krylov-subspace-based eigensolvers have been chosen to compute eigenpairs, due to their efficiency over direct methods for such problem types. Since the kernel method of such eigensolvers is sparse Matrix-Vector (spMV) multiplication, these eigensolvers also lend themselves to parallelization on both shared memory and distributed memory architectures. The serial version of SCATCI that has been modified as part of this dCSE project is hereafter referred to as scatci_serial.
Hereafter, the current development version of SCATCI will be referred to as scatci_slepc. The following sections describe in detail the modifications that have been made to integrate MPI and PETCs/SLEPc into UKRMol-in and specifically SCATCI.

Significant effort has been spent on making this parallel version of SCATCI backwards compatible with scatci_serial. To this end a 'slepc switch' has been implemented, where if slepc_switch == .true., a parallel build of the Hamiltonian matrix is invoked, along with subsequent calls to the SLEPc Library during the diagonalization stage. Correspondingly, if slepc_switch == .false., scatci_slepc will ``emulate'' scatci_serial. This was achived through a combination of if statements and having all calls to MPI and SLEPc/PETSc be made via dummy subroutine calls which can be switched for serial versions at compile time via Make.

For example, rather than calling SlepcInitialize() directly we use a call to a dummy subroutine Slepc_Initialize() which looks like this when compiled for use in scatci_serial:

 subroutine Slepc_initialize(slepc_switch,srank,ssize,scomm)
    implicit none
    logical, intent(out)    :: slepc_switch
    integer, intent(out)    :: srank,ssize,scomm

    slepc_switch = .false.
    srank = 0
    ssize = 1
    scomm = 0
    print*, 'SCATCI run based on serial workflows'

  end subroutine Slepc_initialize

and like this when compiled for use in scatci_slepc:

  subroutine Slepc_initialize(slepc_switch,srank,ssize,scomm)
    implicit none
    logical, intent(out)    :: slepc_switch
    integer, intent(out)    :: srank,ssize,scomm
    integer, parameter      :: master=0
    integer (kind=shortint) :: ierr,srank_32,ssize_32

    slepc_switch = .true.
    call SlepcInitialize(PETSC_NULL_CHARACTER,ierr)
    call MPI_COMM_RANK(PETSC_COMM_WORLD,srank_32,ierr)
    call MPI_COMM_SIZE(PETSC_COMM_WORLD,ssize_32,ierr)
    scomm = int(PETSC_COMM_WORLD)
    srank = int(srank_32)
    ssize = int(ssize_32)
    if (srank == master) then
       print*, 'SLEPc has been initialized'
    end if

  end subroutine Slepc_initialize

Rather than writing the Hamiltonian to file, as is the case in the serial version of SCATCI, in this MPI-based version of SCATCI the Hamiltonian is built in parallel on separate compute cores with each core subsequently inserting its local sparse Hamiltonian matrix elements directly into a PETSc matrix (Mat) object which is subsequently passed to EPS as the operator that defines the eigenvalue problem. Various options are then set for a customized solution, the problem is then solved and finally the the solution is retrieved.

Paul Roberts 2012-06-01