Next: Bibliography
Up: Final Thoughts
Previous: This Project
Contents
HECToR as a Development Machine
HECToR is clearly an excellent machine for running Castep, even
without the efficiency gains made in this project. However there are
some features that have made using it as a development machine
rather difficult. Chief amongst these are:
- Buffered I/O
It is obviously important for performance to
buffer I/O, but HECToR does not flush these buffers when the system
call flush is invoked, or even on job termination. This made
tracking bugs down extremely difficult using Castep's built-in Trace
logging, or even adding write statements.
- Out-of-Memory not logged for user
When one of HECToR's compute
nodes runs out of memory, the Linux OOM module kills a randomly
selected process. This may be the Castep job, in which case the job
terminates. However the PBS output shows only `exit code 137',
indicating that the job was killed, but not why. The OOM module may
also kill any Totalview process, rendering the debugger useless in
such cases.
- No dedicated benchmarking time
In the process of developing
and testing the modified Castep, calculations had to be performed on a
large number of nodes. Many of these calculations were short-the
al3x3 benchmark, for example, takes less than 15mins on 2000 PEs or
more-and it would have been very useful to have some time set aside
for benchmarking. Perhaps this time could be made available after
scheduled downtime.
Next: Bibliography
Up: Final Thoughts
Previous: This Project
Contents
Sarfraz A Nadeem
2008-09-01