Next: NEMO file I/O Up: NEMO on HECToR A Previous: Time spent in file Contents

Aside on I/O strategies for parallel codes

Parallel codes typically use one of the following I/O strategies:

Master-slave - where a single (master) processor performs all the reading/writing and broadcasts/gathers data to/from the slave processors. The slave processors are usually idle whilst the reading/writing is taking place. If the time spent in computation is much greater than the time spent in I/O this approach may be acceptable. However, for codes involving significant amounts of I/O this approach could be highly detrimental to the performance. Due to it's ease of implementation, however, this is still the most common form of I/O used in parallel codes. Often codes were designed to run on a relatively small number of processors where such an approach was suitable. However, in recent years, as the number of processors has increased the master-slave I/O approach is becoming less than ideal.
Multiple masters and groups of slaves or I/O subgroups - similar to the master-slave approach but here we have multiple master processors each gathering data from their own group of slave processors. This approach can reduce the overheads involved in having a single master process carrying out the I/O. It also reduces the memory requirements as the data to be input/output is now distributed between several master processors rather than a single processor. Some synchronisation of the I/O may be required to ensure the data are read/written in the correct order. However, it is anticipated that any synchronisation will be more than offset by the savings made from using multiple master processors.
Parallel I/O - where each processor writes its own data to a separate file. The files then need to be collected together in the correct order at some later stage either via standard Unix commands (e.g. cat) or with a separate code. This approach should be more efficient than the master-slave approach as all the processors are kept busy with none idling. However, there may be limitations on the scalability of this approach. Most operating systems limit the number of files which can be open (for read/write) at the same time. This limit could be as few as 1000 files for some Unix implementations. Some applications may write to several different files and so this places a severe restriction on the number of processors which be used. E.g. if the file limit is 1000 and each processor writes to 10 files then we are limited to running on 100 processors or less. Clearly, this is not ideal. Many applications require many hundreds or thousands of processors and thus a different approach is required.
MPI-IO Extensions to MPI and part of the MPI-2 standard [6]. Essentially it is a library providing functions which can be used to perform parallel I/O using the MPI libraries. A single file is written to by all processors which avoids the limitations of parallel I/O. Each processor writes directly to its own region of the file which avoids the need for any post-processing. As with parallel I/O all the processors are involved in the read/write operation so no-one remains idle. Not fully implemented by all vendors.

A vast number of I/O benchmarks exist and can be used to obtain performance estimates for the different I/O methods.

Subsections

NEMO file I/O

Next: NEMO file I/O Up: NEMO on HECToR A Previous: Time spent in file Contents