## Parallelised Uncertainty Quantification using Differential Geometric Population MCMC

Uncertainty quantification is vital in many scientific areas. In particular, the Bayesian approach provides a consistent inductive logic for reasoning about systems where measurements and models contain often high degrees of uncertainty. A number of competing hypotheses regarding the underlying structure of a system of interest can be encoded as mathematical models, and we may use this Bayesian statistical framework to decide systematically which model is best supported by the observed data and may therefore be best employed for predictive purposes.

This project will develop an efficient C implementation of a parallel Differential Geometric Markov Chain Monte Carlo (MCMC) sampler within a general population MCMC framework using MPI. This state of the art statistical methodology will allow key Bayesian quantities to be accurately estimated by drawing samples from a sequence of tempered probability distributions bridging from the prior to the posterior. The posterior probability distribution gives estimates of unknown parameter values in the statistical model given the available data, and samples from the additional tempered distributions may be used to estimate the marginal likelihood for comparing models.

The original Differential Geometric MCMC code ran on a cluster using the MATLAB parallel toolbox to execute loops over different compute nodes. To achieve a scalable implementation, the original MATLAB code will be extended iteratively by replacing parts with equivalent C code using MATLAB's MEX interface, until there is a fully functional framework written entirely in C. This will involve:

- Developing a unit test suite in MATLAB with high performance MCMC update functions written in C.
- Implementing a fully parallel MCMC framework.

On completion of this project the main achievements may be summarised as follows:

- A parallel differential geometric population MCMC framework has been implemented by using C with OpenMP and MPI.
- The code was benchmarked on the Statistical Science Computer Cluster at UCL, which consists of 52 nodes, each with 2 quad-core 2.7GHz Opteron AMD processors with 8GB of memory.
- Performance of the C only implementations of the MCMC update functions was compared against the corresponding update functions in Matlab for each benchmark model using 1000 datapoints. These models included a 10 Parameter Ion Channel, a 5 Parameter (Fitzhugh Nagumo) ODE and a 28 Parameter (Circadian) ODE. For each model respectively, the observed speed ups were 68 times, 33 times and 6 times.
- Performance was demonstrated for the Ion Channel model. Firstly with OpenMP, on a single node with 8 threads, a 5.4 times speed up over the C code with a single thread was shown. Secondly, with the addition of MPI, scaling effciency was demonstrated up to 100 cores, showing that a relative speed increase of over 61 times is now possible.
- Performance was also demonstrated for the 5 Parameter ODE. On a single node with 8 OpenMP threads, a 4.9 times speed up over the C code with a single thread was shown. With MPI, the scaling effciency for up to 100 cores, showed a relative speed increase of nearly 62 times.
- For the 28 Parameter ODE, the scaling effciency for up to 100 cores showed a relative speed increase of nearly 63 times. These result will have a real and immediate impact on increasing the rate at which many motivating scientific questions may be answered; e.g. 100,000 samples can now be generated for the Circadian model with 100 Markov chains in around 1 hour using 100 cores, whereas the previous Matlab version would take around 2 days with 8 cores. This offers a significant advantage when there are large numbers of structural hypotheses to be tested and compared.
- Scalable overall performance, which is up to 2 orders of magnitude faster than the original Matlab (maximum 8 core) implementation is now possible.
- The modular design of the new code also allows it to be easily extended to enable statistical inference over additional classes of models, thus ensuring it widespread utility to practising scientists and engineers.
- The software developed enables systematic comparison of model hypotheses for systems described using coupled systems of nonlinear differential equations, such as gene regulatory networks, and systems described using aggregated Markov processes, such as ligand-gated ion channel mechanisms in the biological sciences. This software can be deployed on any modelling problem where nonlinear differential equations or Markov processes are used and as such has the potential to have a widespread impact especially as the size and complexity of such models is growing.
- The software is currently being used in collaboration with projects between UCL, Glasgow and Warwick universities. Plans are also in progress to make the code available on the web following further testing.

Please see PDF or HTML for a report which summarises this project.