Parallel Matrix/Vector Assembly

To make efficient use of the PETSc library, the correct use of the PETSc data parallel structures for Matrices and Vectors is crucial for performance. Most of the work necessary to use PETSc was contained in constructing and populating these data structures. The MicroMag code reads in the metis mesh decomposition, and the uses this, along with information about the mineral grain, which is also an input, to create a series of sparse matrices which are then used to update the magnetic field, which in this context is a vector. The MicroMag code constructs the values and stores them in a matrix storage format called SLAP column format.

This is similar to the more generic sparse format compressed sparse column (CSC) format, but order of the stored values changed so that the diagonal element is stored first for each column. The default parallel decomposition for PETSc is decompose the matrix by row, rather than column, across the number of processing elements. This potentially made the task rather more complicated. However, with two notable exceptions, all the matrices in MicroMag are symmetric. So, the column major order of the micromagnetic data structures can be easily translated into a row major order, and then decomposed. In some sense this is both natural and helpful translation as MicroMag is a Fortran90 code and multi-dimension arrays are naturally column major order i.e. left most index is fastest moving, and PETSc written in C which has the opposite convention. The exceptions to the symmetric matrices can be dealt with as special cases.

The PETSc library has specific data structures for sparse matrices. The CSR format is referred to as AIJ. To use this format in parallel, a method, named MatMPIAIJSetPreallocation, is called to create and allocate the data structure. PETSc's default parallel decomposition for matrices is by row. The allocation method makes the distinction between the diagonal block of matrix elements (where the column values are the same as the rows ``owned'' by the MPI task, and the non-diagonal block of elements (where the column values are different from the rows ``owned'' by the MPI task). These are passed as arguments of the method as d_nz the diagonal number of non-zeroes and o_nz, the off-diagonal number of non-zeroes. In particular the documentation states: By setting these parameters accurately, performance can be increased by more than a factor of 50. Determining and correctly setting these parameters is key to getting good performance from the PETSc library.

For example in the new MicroMag code which uses PETSc, we now use the SLAP column format to determine the number of non-zeroes which are in the diagonal block.

d_nnz=0
o_nnz=0
count=1
do i=(nmax_off),(nmax_off+nmax_l-1)
nze_l=nze_l+(CNR(i+1)-CNR(i))
do j=CNR(i),CNR(i+1)-1
if( (RNR(j).ge.nmax_off) .and. $
$ ( RNR(j).lt.(nmax_off+nmax_l)) )then
! we are in the diagonal block
d_nnz(count)=d_nnz(count)+1
nze_d=nze_d+1
else
o_nnz(count)=o_nnz(count)+1
end if
end do

In the code fragment above, each processor determines the whole matrix in SLAP column format, but this involves replicated data. So each process loops over only the number of elements that it owns. For each column of the matrix it owns, it then loops over the column start (CNR(i)) to end (CNR(i+1)) and counts the number of non-zero elements (nze_l), and how many are diagonal (d_nnz) and off-diagonal (o_nnz). The PETSc matrix data structures are then allocated as shown in the following code fragment.

CALL MatCreate(comm,PA,mpierr)
CALL MatSetSizes(PA,nmax_l,nmax_l,NMAX,NMAX,mpierr)
CALL MatSetType(PA,MATMPIAIJ,mpierr)
CALL MatMPIAIJSetPreallocation(PA,0,d_nnz,0,o_nnz,mpierr)

and the local PETSc matrix populated from the replicated SLAP column format matrix as shown in the code fragment below.

call MatGetOwnershipRange(PA,istart,iend,mpierr)
do i=istart,iend-1
do j=CNR(i+1),CNR(i+2)-1
call MatSetValue(PA,i,(RNR(j)-1),YA(j), $
INSERT_VALUES,mpierr)
end do
end do
write(messageText, $
’("Matrix PA B/C/D allocated and assigned")’)
CALL writeMessage(’nonzerostiff’,messageText,0)
CALL MatAssemblyBegin(PA,MAT_FINAL_ASSEMBLY,mpierr)
CALL MatAssemblyend(PA,MAT_FINAL_ASSEMBLY,mpierr)

Chris Maynard 2011-06-08