3D Lid-driven Cavity Flow

Next: Environment Variables Up: Benchmarking Previous: Timing and Performance Contents

3D Lid-driven Cavity Flow

This benchmark is a simple extension of the 2D lid-driven cavity flow, which is the first OpenFOAM tutorial [9].

As with the 2D case, the top wall moves in the x-direction at a speed of 1ms^-1 whilst the remaining walls remain stationary, the flow is assumed laminar and is solved on a uniform mesh using the icoFoam solver for laminar, isothermal, incompressible flow.

Unlike the 2D case, all output was switched off. Further, two cases were create for cubic meshes with different discretisations, namely $100\times100\times100$ and $200\times200\times200$ .

The $100\times100\times100$ case is of particular interest as it has been employed elsewhere [10][11] for benchmarking purposes, and is thus open for comparison.

To create the 3D case, the blockMeshDict file was changed to

    hex (0 1 2 3 4 5 6 7) (100 100 100) simpleGrading (1 1 1)

and

    hex (0 1 2 3 4 5 6 7) (200 200 200) simpleGrading (1 1 1)

For both the 100³ and 200³ cases, the controlDict file contained

application icoFoam;
startFrom       startTime;
startTime       0;
stopAt          endTime;
endTime         0.025;
deltaT          0.005;
writeControl    timeStep;
writeInterval   20;
purgeWrite      0;
writeFormat     ascii;
writePrecision  6;
writeCompression uncompressed;
timeFormat      general;
timePrecision   6;
runTimeModifiable yes;

For 8 cores, the file decompoiseParDict contains the lines

numberOfSubdomains 8;
method          simple
simpleCoeffs
{
    n               (2 2 2);
    delta           0.001;
}

where

    n               (2 2 2);

denotes a cubic layout of 8 processors, np_x=2, np_y=2 and np_z=2, in a virual grid. Indeed, many processor configurations were investigated. For instance, for 8 processors, (8 x 1 x 1, (1 x 8 x 1), (1 x 1 x 8), (4 x 2 x 1), (1 x 2 x 4), etc. Simulations were run for numberOfSubdomains =1, 2, 4, 8, ....

It has previously been reported, [10], that varying the processor virtual topology when running a 3D lid-driven cavity flow case on a Cray XT4/XT5, has little affect on performance. Our investigation showed that, to a first approximation, this is indeed the case. However, we found a slight performance gain if the value of np_z is as low as possible. This is due to the fact that C++ stores the last dimension of arrays contiguously in memory and, if the last dimension is not distributed over processors, then more data will be kept in cache lines.

We ran the 100³ and 200³ cases for 5 time step, and 200³ case again but for 40 time steps, to investigate the impact of start-up.

A summary of the results are displayed in figure 1.

**Figure:** Timing results for 3D lid-driven cavity flow, where `100' is the 100³ case, and `200' is the 200³ case, both running for 5 time steps, `200.4' is the 200³ case, both running for 40 time steps, and `200.lin' is the perfect scaling line for comparison only
$\includegraphics[width=10cm]{cavity.png}$

The timing and performance results are presented in table 1, where performance figures are calculated as described in 6.2.2.

Table 1: Timing and performance results for 3D lid-driven cavity flow

Number	100³, 5 time steps	200³, 5 time steps	200³, 40 time steps
of cores	Time (Performance)	Time (Performance)	Time (Performance)
4	795.3 (-)	8803.9 (-)	- (-)
8	410.7 (1.94)	4514.0 (1.95)	- (-)
16	203.1 (2.02)	2069.0 (2.18)	- (-)
32	102.9 (1.97)	1068.9 (1.94)	- (-)
64	41.4 (2.49)	519.8 (2.06)	- (-)
128	21.8 (1.90)	277.2 (1.88)	11691.6 (-)
256	19.2 (1.14)	139.5 (1.99)	5322.1 (2.20)
512	23.3 (0.82)	70.7 (1.97)	2586.1 (2.06)
1024	42.9 (0.54)	59.8 (1.18)	1488.7 (1.73)
2048	53.4 (0.80)	67.4 (0.78)	1272.6 (1.17)
4096	- (-)	104.7 (0.73)	1623.9 (0.78)

From table 1, it can be seen that for the 100³ case running for 5 time steps, the optimum number of cores is 128, for the 200³ case running for 5 time steps, the optimum number of cores is 512. However, the optimum number of cores for 200³ case running for 40 time steps is 1024. This implies that running the simulation for only 5 time steps was not long enough to remove the startup cost of the simulation.

Thus, from this exercise, we can suggest that, if using icoFoam in 3D, with a regular mesh, then for simulations with 200³, or 8 million grid points, we recommend running on 1024 cores.

Environment Variables

Next: Environment Variables Up: Benchmarking Previous: Timing and Performance Contents

Gavin J Pringle
2010-04-16