(a)
(b) |
ChemShell is implemented in parallel using MPI (message passing interface). The parallel framework used before task-farming was added is illustrated in Figure 1a. One node acts as a `master', on which a Tcl interpreter runs and executes the input script. The other nodes are `slaves' that remain idle until a parallel input command is executed. This would typically be a request for a gradient evaluation using an external program. To achieve maximum efficiency the external program is linked into the ChemShell executable so it can be called directly without spawning an additional process. The external program therefore shares the same MPI environment as ChemShell and can make use of all the nodes for its own parallel calculations. When the command has completed control returns to the master node.
Task-farming parallelism adds an extra layer of organisation to the processors,
which are grouped into independent workgroups using MPI communicators.
In the original parallel framework
the nodes were grouped into the default MPI_COMM_WORLD
communicator.
To create independent workgroups MPI_COMM_WORLD
is split into
smaller sets of processors, each with their own Workgroup communicator (named MPI_COMM_WORKGROUP
).
The user specifies the number of workgroups to be created using the command-line
argument -nworkgroups
when ChemShell is executed.
In each workgroup one node acts as a master and the rest are slaves (Figure 1b).
They operate in the same way as in the original parallel framework,
except that there are now multiple master nodes (if the number
of workgroups is set to one, the new framework reduces to the original).
Each master runs a copy of the Tcl interpreter and independently executes the same
ChemShell input script.
A number of Tcl commands have been implemented to report workgroup information, such
as how many workgroups exist (nworkgroups
) and which workgroup the script is running in (workgroupid
).
These commands may be used to make parts of the script conditional on the workgroup
ID and this provides a mechanism to distribute tasks between workgroups, as
in this simple input script:
set wid [ workgroupid ] if { $wid == 2 } { do_task }
Although the input is parsed by all workgroups, the procedure
do_task
would only be run on workgroup 2.
To prevent file conflicts, a scratch working directory is created for each workgroup (workgroup[n]/
).
This is important for ChemShell data objects, which may be saved to disk.
By default ChemShell objects will be loaded from the working directory, but if not
present the common parent directory will also be searched. This makes it possible to
create globally-accessible objects. Local objects may be `globalised' using a Tcl procedure
(taskfarm_globalise_objects
).
This command first synchronises the workgroups using a call to MPI_Barrier
to prevent data corruption.
The ChemShell standard output and standard error are also separated out into the working directories,
under the names workgroup[n].out
and workgroup[n].err
, with the exception of the output of
workgroup 0 which is treated as the main output and not redirected.
Tom Keal 2010-06-29