next up previous contents
Next: Environment Variables Up: Benchmarking Previous: Timing and Performance   Contents

3D Lid-driven Cavity Flow

This benchmark is a simple extension of the 2D lid-driven cavity flow, which is the first OpenFOAM tutorial [9].

As with the 2D case, the top wall moves in the x-direction at a speed of 1ms-1 whilst the remaining walls remain stationary, the flow is assumed laminar and is solved on a uniform mesh using the icoFoam solver for laminar, isothermal, incompressible flow.

Unlike the 2D case, all output was switched off. Further, two cases were create for cubic meshes with different discretisations, namely $100\times100\times100$ and $200\times200\times200$.

The $100\times100\times100$ case is of particular interest as it has been employed elsewhere [10][11] for benchmarking purposes, and is thus open for comparison.

To create the 3D case, the blockMeshDict file was changed to

    hex (0 1 2 3 4 5 6 7) (100 100 100) simpleGrading (1 1 1)
and
    hex (0 1 2 3 4 5 6 7) (200 200 200) simpleGrading (1 1 1)

For both the 1003 and 2003 cases, the controlDict file contained

application icoFoam;
startFrom       startTime;
startTime       0;
stopAt          endTime;
endTime         0.025;
deltaT          0.005;
writeControl    timeStep;
writeInterval   20;
purgeWrite      0;
writeFormat     ascii;
writePrecision  6;
writeCompression uncompressed;
timeFormat      general;
timePrecision   6;
runTimeModifiable yes;

For 8 cores, the file decompoiseParDict contains the lines

numberOfSubdomains 8;
method          simple
simpleCoeffs
{
    n               (2 2 2);
    delta           0.001;
}

where

    n               (2 2 2);

denotes a cubic layout of 8 processors, npx=2, npy=2 and npz=2, in a virual grid. Indeed, many processor configurations were investigated. For instance, for 8 processors, (8 x 1 x 1, (1 x 8 x 1), (1 x 1 x 8), (4 x 2 x 1), (1 x 2 x 4), etc. Simulations were run for numberOfSubdomains =1, 2, 4, 8, ....

It has previously been reported, [10], that varying the processor virtual topology when running a 3D lid-driven cavity flow case on a Cray XT4/XT5, has little affect on performance. Our investigation showed that, to a first approximation, this is indeed the case. However, we found a slight performance gain if the value of npz is as low as possible. This is due to the fact that C++ stores the last dimension of arrays contiguously in memory and, if the last dimension is not distributed over processors, then more data will be kept in cache lines.

We ran the 1003 and 2003 cases for 5 time step, and 2003 case again but for 40 time steps, to investigate the impact of start-up.

A summary of the results are displayed in figure 1.

Figure: Timing results for 3D lid-driven cavity flow, where `100' is the 1003 case, and `200' is the 2003 case, both running for 5 time steps, `200.4' is the 2003 case, both running for 40 time steps, and `200.lin' is the perfect scaling line for comparison only
\includegraphics[width=10cm]{cavity.png}

The timing and performance results are presented in table 1, where performance figures are calculated as described in 6.2.2.


Table 1: Timing and performance results for 3D lid-driven cavity flow
Number 1003, 5 time steps 2003, 5 time steps 2003, 40 time steps
of cores Time (Performance) Time (Performance) Time (Performance)
4 795.3 (-) 8803.9 (-) - (-)
8 410.7 (1.94) 4514.0 (1.95) - (-)
16 203.1 (2.02) 2069.0 (2.18) - (-)
32 102.9 (1.97) 1068.9 (1.94) - (-)
64 41.4 (2.49) 519.8 (2.06) - (-)
128 21.8 (1.90) 277.2 (1.88) 11691.6 (-)
256 19.2 (1.14) 139.5 (1.99) 5322.1 (2.20)
512 23.3 (0.82) 70.7 (1.97) 2586.1 (2.06)
1024 42.9 (0.54) 59.8 (1.18) 1488.7 (1.73)
2048 53.4 (0.80) 67.4 (0.78) 1272.6 (1.17)
4096 - (-) 104.7 (0.73) 1623.9 (0.78)

From table 1, it can be seen that for the 1003 case running for 5 time steps, the optimum number of cores is 128, for the 2003 case running for 5 time steps, the optimum number of cores is 512. However, the optimum number of cores for 2003 case running for 40 time steps is 1024. This implies that running the simulation for only 5 time steps was not long enough to remove the startup cost of the simulation.

Thus, from this exercise, we can suggest that, if using icoFoam in 3D, with a regular mesh, then for simulations with 2003, or 8 million grid points, we recommend running on 1024 cores.




next up previous contents
Next: Environment Variables Up: Benchmarking Previous: Timing and Performance   Contents
Gavin J Pringle
2010-04-16