Load Balancing

When making a call to Zoltan_LB_Balance or Zoltan_LB_Part Zoltan is only concerned with the nodes making up the graph as well as how those nodes are connected, the edges. Four callback functions must be provided for these library calls:

  1. Counting the number of local nodes.
  2. Listing those local nodes.
  3. Counting the number of edges associated with each local node.
  4. Listing edges for each local node (detailing the neighbouring node the edge connects to).

When providing the lists of nodes and edges a weight can also be provided. In this implementation non-uniform node weights are only applied when using an extruded mesh (an extruded mesh is one derived from a 2D surface mesh and extruded in the direction of gravity to produce a 3D mesh). Weighting is used more heavily for edges as it is key in the parallel adaptivity approach taken in Fluidity as described in section 2.3.

Edge weighting is applied in the callback function zoltan_cb_get_edge_list. This callback must provide for each edge of every local node the neighbouring node the edge connects to and which process currently owns the neighbour node. We also choose to provide an edge weight for each edge. The pseudo code (Listing 1) below details the implementation.


\begin{lstlisting}[label=edge_weighting,caption=Edge weighting pseudo-code]
for ...
...edge_weight > ninety_weight)
edge_weight = num_total_edges + 1
\end{lstlisting}

The implementation loops over all the local nodes and for each node records the global ID and owning process of all its neighbour nodes. The next step is to calculate the edge weight to apply to the edge between the local node and each neighbour node. The aim is to apply a high edge weight to poor quality elements. This is done by first determining the poorest quality element associated with the local node. Then for each neighbour node the poorest quality element associated with the neighbour node is determined. The minimum element quality for either the local node or the neighbour node is then used to calculate the weight applied to the edge between the local node and that neighbour node.

Once all of the edge weights have been calculated the nodes associated with the poorest quality elements then have their edge weights adjusted to a value one higher than the total number of edges in the simulation. This makes them uncuttable. Those edges with an edge weight calculated to be within 10% of the maximum edge weight on any process are given such a weight.

Edge weighting was implemented in such a way as to replicate the behaviour of the previous mesh repartitioning solution used in Fluidity. However, when using Zoltan, despite the edge weights being applied correctly, the mesh repartitioning sometimes failed to move the partition boundary away from the poor quality elements. This was found to be because Zoltan prioritises load balanced partitions over edge weighting and would sometimes ignore edge weighting to meet load balance criteria. The solution was to loosen the Zoltan parameter, IMBALANCE_TOL.

Within Zoltan priority is given to having well load balanced partitions. The amount of load imbalance tolerated in the system is controlled through the IMBALANCE_TOL. To determine the imbalance on each processor the weights of all the objects it is assigned are summed together to get its total load. The average load is calculated and from this the imbalance is computed as the maximum load divided by the average load. For example a value of 1.2 for IMBALANCE_TOL means that 20% imbalance is acceptable: that is no process should have more load than 1.2 times the average load.

The default IMBALANCE_TOL is 1.075 but for Fluidity this is changed to 1.5. The option has been made available through the Fluidity options system as load_imbalance_tolerance. This allows users to modify the value should they need to for their problem. This solved the problem with poor elements still being in a halo region after a repartition but added a further issue of empty partitions.

By loosening the IMBALANCE_TOL Zoltan would occasionally repartition the mesh in such a way that a process had no owned nodes. Fluidity assumes that no process will have an empty partition and hence this causes numerous problems throughout the code. It was not feasible to modify Fluidity to deal with empty partitions so instead a solution to prevent empty partitions was implemented. This was possible due to the implementation being split into distinct steps, first a load balance and then data migration.

The solution was to make a load balance call and then check for empty partitions before doing any data migration. The check for empty partitions was done by checking:

\begin{displaymath}
N_o + N_i - N_e \not= 0
\end{displaymath} (2)

where $N_o$ is the number of nodes owned, $N_i$ is the number of nodes to import, and $N_e$ is the number of nodes to export. This was possible as the Zoltan_LB_Balance returns both the number of nodes being imported to this process and the number of nodes it will be exporting.

If an empty partition would be created following the load balance then the load balance is attempted again but with the IMBALANCE_TOL tightened. This process continues until a partitioning with no empty partitions is found or the IMBALANCE_TOL can be tightened no further. In this situation a final load balance attempt is made with the edge weighting switched off and the IMBALANCE_TOL at 1.075. This solution prevents empty partitions being created and by going through this process before migrating the computational cost is reduced.

Jon Hill 2012-03-02