Implementation

CASTEP contains a "trace" module which provides, amongst other functionality, timing data via the "trace_entry" and "trace_exit" subroutines. There is a call to trace_entry at the start of most CASTEP subroutines, and a corresponding call to trace_exit at each exit point. Internally the trace module times how long CASTEP spent between the entry and exit point, and associates that time with the subroutine. The trace module also supports the association of a subroutine with a class of operation, allowing the large amount of timing data to be reported per class of operation rather than per individual subroutine.

By defining the generic class "comms", and the specific sub-classes "comms_farm", "comms_gv", "comms_kp" and "comms_bnd", each communication subroutine could be associated with one or more communication class. At the end of the CASTEP calculation this aggregate communication time is compared to the total time for the calculation, and from the resultant comms:compute ratio a parallel efficiency is estimated. The subclasses (comms_gv, etc.) allow the efficiency to be broken down further to demonstrate the efficiency of each type of parallelism; this allows a user to evaluate the particular efficiency of each aspect of the present parallel distribution. An example of the CASTEP parallel efficiency report is given below, for the standard "al3x3" benchmark:

Overall  parallel  efficiency  rating:  Satisfactory  (61%) 

Data  was  distributed  by:-
Gvector  (128way);  efficiency  rating:  Satisfactory  (61%)

kpoint  (2way);  efficiency  rating:  Excellent  (99%)