next up previous contents
Next: Final Thoughts Up: Other Developments and Future Previous: Distributing -Projectors and   Contents

Mpitrace

The development, debugging and profiling of Castep was made substantially easier by the presence of a built-in Trace module. This module logs almost all of the subroutine and function entries and exits, and keeps track of their parents and children. In addition to this there are optional facilities to time the calls, and also to log each entry and exit-all on a per-PE basis.

By switching on the subroutine logging, it is possible to see which routine each PE is in on any machine at any point of a Castep calculation. This can aid the programmer in tracking down possible parallel node-desynchronisation, especially if Castep hangs, but the root cause is often a long way back up the call-tree and can prove very difficult to locate. Furthermore, the Trace module needs to be lightweight enough (in terms of time and memory) that it can always be used without undue penalty and so it lacks the facility to store or log variables etc.

A possible extension has been proposed by Dr K. Refson, which is essentially MPI-Trace. This would contain its own, independent communication layer, communicators etc. to enable it to verify the state of a parallel Castep calculation. Calls to this module could check whether the nodes are synchronised at this point across the gvector-, band- or k-point groups, compare key variables, or even check MPI collectives, or trap mismatched point-to-point communications.


next up previous contents
Next: Final Thoughts Up: Other Developments and Future Previous: Distributing -Projectors and   Contents
Sarfraz A Nadeem 2008-09-01