The HECToR Service is now closed and has been superceded by ARCHER.

Expressive and scalable finite element simulation beyond 1000 cores

The FEniCS Project is a widely used environment for partial differential equations (PDEs), which allows users to specify equations in mathematical symbolic form via a domain-specific language, and solve them using the finite element method. The FEniCS problem solving environment/library DOLFIN provides C++ and Python interfaces, and relies on automated code generation to reconcile expressive input with high performance. Because of the generic nature of the software, many different scientific problems are addressed using FEniCS/DOLFIN, including geodynamics, heat flow, elasticity, electromagnetics, flow through porous media, Navier-Stokes equations and acoustics.

DOLFIN is being used increasingly in HPC research projects, which has placed new demands a number of parallelisation aspects. The objectives of this project were to meet some pressing needs to enable HPC applications and to exploit modern architectures using DOFLIN. The three identified objectives were:

  • Hybrid OpenMP/MPI finite element assembly of vector and sparse matrices.
  • Scalable and portable I/O on parallel file systems.
  • Scalable distributed mesh refinement.

On completion of this project the main achievements may be summarised as follows:

  • Hybrid OpenMP/MPI finite element assembly was developed by implementing thread safe calls to PETSc for the linear solvers in DOLFIN. Performance was demonstrated for a 3-D Poisson problem (linear Lagrange elements) with 2,097,152 points, and two 3-D Navier-Stokes-like (N-S) problems (linear and quadratic Lagrange elements), both with 824,000 points.
  • Scaling for the N-S problems was good for up to 16 threads, however scaling of the Poisson problem fell away dramatically above 8 threads and performance for both cases fell short of a pure MPI approach. The advocated cell colouring approach which was implemented is not cache friendly for problems with otherwise good data locality. However, some alternative thread-safe schemes that will improve data locality were identified and will hopefully be implemented in the future.
  • A well-defined interface to the HDF5 library was implemented, enabling all the required parallel I/O functionality. Additionally, support for the metadata format XDMF was also implemented. This allows many of the HDF5 output formats to be read by the widely available visualisation packages, including ParaView and VisIt. The time for reading a representative sized mesh was benchmarked and the general observation was that I/O time did not increase for a fixed size problem with increasing process counts. A speed up factor of over 100 is now possible, such that in the context of the solution runtime on such a mesh, I/O time is now negligible.
  • Two new interfaces were developed for 2-D and 3-D mesh parallel refinement. An adaptive mesh-refining algorithm was implemented. By integrating this within the interfaces to ParMETIS and Zoltan, simple load balancing is now possible.
  • The time required for refinement is now negligible in the context of runtime for a typical application.
  • The outcomes of the project have either already appeared in a release version of the DOLFIN library or will be ready for the next software release.

Please see PDF or HTML for a report which summarises this project.