Distributed Data Parallelism in CABARET

This project began during the transistion of Phase 1 of HECToR (when each compute node was a dual core AMD Opteron 2.8GHz chip with 6GB RAM) to Phase 2a (with each chip a quad core 2.3GHz Opteron processor with 8GB of RAM). Data parallelism in CABARET for handling a partitioned Gambit unstructured mesh was initially designed for use on similar architectures, with the level of parallelisation based on MPI.

In this single program multiple data approach, the data parallelism relies upon a partitioning of the Gambit generated computational grid in order to produce sub grids, so that each one of these can be used by the individual instance of the CABARET code. Due to the unstructured decomposition used within the core CABARET algorithm an indirect referencing scheme then manages access to halo data and associated communications for the updates needed after each time step. Asynchronous MPI calls were implemented within PHASE1, PHASE2 and PHASE3 and placed so that computation could be performed while communication was in process, ensured that performance was efficient.

At the beginning of this dCSE project, the hardware setup for Phase 2b and beyond was unknown. Nearing the latter half of the work, however, it was possible to assume that this hardware would become more increasingly multi-core focussed. When Phase 2b of HECToR arrived (XT6 - 24 core Opteron 6172 2.1GHz processors, arranged as two dual hex core sockets per node in a non uniform memory architecture) it was also clear that the existing MPI parallel CABARET code would need development for multi-core architecture.

Phil Ridley 2011-02-01