HECToR Monthly Report, February 2012
Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.
Dates covered: 08:00 1 February 2012 to 08:00 1 March 2012
Number of hours: 696
Scheduled down time: 11 hours 18 minutes. This includes closing for
the Phase 3 launch event on 13 February 2012.
The following incidents were recorded:
Of the four severity levels, level 1 corresponds to a contractual failure.
Out of the 17 SEV-3 Incidents, 17 were attributed to node failure events.
Details of severity level 1 incidentsNone this month.
MTBF and Serviceability
- Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
- Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.
Details of single node failures
|Kernel panic (Out-of-memory error/Opteron error)||4|
|MCA bank 4 error||3|
|Admindown by Node Health Check script(*)||3|
|HT lockup error||2|
|MCE threshold exceeded||2|
|Thermtrip on Opteron processor||1|
|xtbounce failure during PM. [Node not returned at end of PM](*)||1|
(*) No user job lost
2: CoursesThis information is supplied by NAG Ltd
|Title of Course||Dates||Available Places||Ordinary Attendees||Paying Attendees||CSE Staff||Total Attending|
|Advanced Computational Methods II (MSc), University of Southampton||Every Friday in February||30||24||0||0||24|
|Introduction to HECToR, NAG Oxford||6 February 2012||12||3||0||0||3|
|Debugging, Profiling and Optimising, NAG Oxford||7 - 9 February 2012||6||0||0||0||6|
|Parallel Programming with MPI, University of Sheffield||8 - 10 February 2012||30||22||0||0||22|
|Introduction to HECToR, NAG Oxford||20 February 2012||12||1||0||0||1|
|Debugging, Profiling and Optimising, NAG Oxford||21 - 23 February 2012||12||5||0||0||5|
5: Performance Metrics
|Technology throughput||7000 hours||8367 hours||8648 hours|
|Capability job completion rate||70%||90%||100%|
|Technology MTBF||100 hours||126.4 hours||∞|
Note: Technology throughput is calculated: 12*(732-UDT-SDT), where 732 is the annual average number of hours in a month.
Note: MTBF is calculated as 732/number of failures
|Percentage of non-in-depth |
queries resolved within one day
|Number of SP FTEs||7.3||8.0||8.7||8.6|