Aims and Objectives

The purpose of this project was to re-engineer the parallel anisotropic mesh adaptivity and load balancing algorithms of Fluidity Piggott2009,Gorman2006 by incorporating Zoltan, a collection of data management services for unstructured, adaptive and dynamic applications. The aim of this was to improve the scaling behaviour of Fluidity and allow the adaptive remeshing algorithms Pain2001 to be used in parallel, on any element pair, rather than being restricted to a single pair.

Zoltan includes a suite of parallel partitioning algorithms, data migration tools, parallel graph colouring tools, distributed data directories, unstructured communication services, and dynamic memory management tools. In addition to possibly improving scaling, the inclusion of Zoltan has also improved software sustainability, and add new functionality. This will deliver new performance capabilities that will prepare Fluidity for petaflop systems, such as those proposed by the PRACE project, when combined with the new capabilities will enable new science to be carried out.

In addition to allowing new features to be added to the adaptivity capabilities of Fluidity, the inclusion of Zoltan has allowed Fluidity to be coupled with other models such as atmospheric and ice-sheet models, a key part of future research, as these applications require a non-standard discretisations. Finally, a recent development in Fluidity was the addition of Lagrangian particles, which are free-moving particles in the flow and are used as either detectors or within agent-based modelling. This is now in need of parallelisation which, when considering possible contradictory load-balancing objectives between the mesh and the particles, is a non-trivial task. Load-balancing and parallel data migration algorithms for such purposes already exist in the Zoltan library (e.g. Rendezvous) and hence has aided in this objective.

The project was set out into six distinct packages of work with measurable deliverables. These were as follows:

To become familiar with the Fluidity code, build process and benchmark suite. A short report detailing the times for computation, communication, data migration and adaptivity for one of the benchmarks would be produced.
Adding the Zoltan solution for parallel adaptivity as a compile time option for Fluidity. Using the Zoltan interface to ParMETIS the functionality of the previous solution should be replicated and the Zoltan solution should pass all tests in the Fluidity test suite (unit, short, medium and long). Additional tests comparing Zoltan functionality against the previous solution as well as unit tests for the communications functionality of Zoltan should be completed. A short report detailing the implementation produced as a dCSE Technical Report.
Zoltan solution used as the default in all Fluidity development branches and future releases of Fluidity. All Zoltan development branches merged back into the trunk and all tests passing using the Zoltan solution. Documentation for all Zoltan options provided in the Fluidity manual and Fluidity options system, Diamond Ham2009.
Investigate performance improvements for Fluidity using the various Zoltan options made available through the Fluidity options system, diamond. Aim is for a reduction of 15% or more in communication time for all HPC benchmarks when using the Zoltan solution compared against the previous solution. Aim for speed-up from 64 to 2048 processes to be increased by 15% using the Zoltan solution compared to the previous solution. All work to be documented in a short report. Code extensibility as important as code performance improvements but there are no deliverables related to this.
Final report including profiling results for the four HPC benchmarks completed.
Participate in and present updates on progress at the Fluidity HPC Workshops.

This report will begin by giving some background on Fluidity, along with the adaptivity and parallel adaptivity algorithms implemented in Fluidity. The Zoltan library will then be discussed. This will be followed by a description of the implementation and the profiling results from the HPC benchmarks run on HECToR will be presented. Finally the conclusion will give details of the outcomes from this project and how it has benefited HECToR users.

Jon Hill 2012-03-02