Next: NEMO file I/O
Up: NEMO on HECToR A
Previous: Time spent in file
Contents
Parallel codes typically use one of the following I/O strategies:
- Master-slave - where a single (master) processor performs all the
reading/writing and broadcasts/gathers data to/from the slave processors.
The slave processors are usually idle whilst the reading/writing is taking
place. If the time spent in computation is much greater than the time spent
in I/O this approach may be acceptable. However, for codes involving
significant amounts of I/O this approach could be highly detrimental to
the performance. Due to it's ease of implementation, however, this is
still the most common form of I/O used in parallel codes. Often codes were
designed to run on a relatively small number of processors where such an
approach was suitable. However, in recent years, as the number of
processors has increased the master-slave I/O approach is becoming less
than ideal.
- Multiple masters and groups of slaves or I/O subgroups - similar to
the master-slave approach but here we have multiple master processors
each gathering data from their own group of slave processors. This approach
can reduce the overheads involved in having a single master process
carrying out the I/O. It also reduces the memory requirements as the data
to be input/output is now distributed between several master processors
rather than a single processor. Some synchronisation of the I/O
may be required to ensure the data are read/written in the correct order.
However, it is anticipated that any synchronisation will be more than
offset by the savings made from using multiple master processors.
- Parallel I/O - where each processor writes its own data to a separate
file. The files then need to be collected together in the correct order at
some later stage either via standard Unix commands (e.g. cat) or
with a separate code. This approach should be more efficient than the
master-slave approach as all the processors are kept busy with none idling.
However, there may be limitations on the scalability of this approach. Most
operating systems limit the number of files which can be open (for read/write)
at the same time. This limit could be as few as 1000 files for some Unix
implementations. Some applications may write to several different files
and so this places a severe restriction on the number of processors which
be used. E.g. if the file limit is 1000 and each processor writes to 10 files
then we are limited to running on 100 processors or less. Clearly, this
is not ideal. Many applications require many hundreds or thousands of
processors and thus a different approach is required.
- MPI-IO
Extensions to MPI and part of the MPI-2 standard [6].
Essentially it is a library providing functions which can be used to perform
parallel I/O using the MPI libraries. A single file is
written to by all processors which avoids the limitations of parallel
I/O. Each processor writes directly to its own region of the file which
avoids the need for any post-processing. As with parallel I/O all
the processors are involved in the read/write operation so no-one remains
idle. Not fully implemented by all vendors.
A vast number of I/O benchmarks exist and can be used to obtain performance
estimates for the different I/O methods.
Subsections
Next: NEMO file I/O
Up: NEMO on HECToR A
Previous: Time spent in file
Contents