The current BLACS processor grid assumes that all nodes in the band- and gv-groups are available to participate in any ScaLAPACK operation. Although this is true for the matrices we have discussed so far, it would be better to define several BLACS grids and add optional arguments to algor_invert and algor_diagonalise to define the distribution. This would also aid optimisation, as at the moment the matrix that is passed into the algor subroutines must be the global, non-distributed matrix which often requires the calling routine to do an otherwise unnecessary reduction over the band- and/or gvector-groups.
Finally, at the time of writing there are two limitations inherent in the BLACS and ScaLAPACK libraries. Firstly, ScaLAPACK lacks a Hermitian matrix inverter, and so we have to use a general complex matrix inverter (PZGETRF and PZGETRI); for this reason it may be better to use the serial version of the subroutine when parallelising over only two PEs. The second limitation is that many of the diagonalisation and inversion routines require a so-called `square' distribution of the matrices, i.e. the row and column block sizes are the same. This is unfortunate as the natural distribution in Castep is to distribute one over the band-group and the other over the gvector-group and these will not usually have the same number of nodes in them.