HECToR Monthly Report, June 2010
Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.
Dates covered: 08:00 1 June 2010 to 08:00 1 July 2010
Number of hours: 744
Scheduled down time: 7 hours 22 minutes.
The following incidents were recorded:
Of the four severity levels, level 1 corresponds to a contractual failure.
Out of the 26 SEV-3 Incidents, 26 were attributed to single node failure events. Three of the 5 SEV-2 Incidents were attributed to node fails where multiple nodes were impacted.
Details of severity level 1 incidents
|Incident-3722||15/06/2010||Cabinet emergency power down due to power supply failure||04:58||Cray|
MTBF and Serviceability
- Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
- Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.
Details of single node failures
|RX message header CRC Error||12|
|RX message CRC Error||6|
|RX packet sequence number error||2|
|MCA error (internal Opteron cache error)||2|
|UME bk4 (bk0 DIMMS)||1|
|Software/application related error||6|
2: CoursesThis information is supplied by NAG Ltd
There were no courses held in June.
3: Quality Tokens
|14-June-2010 18:49:10||* * * * *||Positive feedback. No Comment Supplied||x01|
|14-June-2010 12:31:59||• • •||I have been waiting for a response from my query Q90696 for six days and haven't heard anything. This is holding my work up||n02|
4: Hours Worked
5: Performance Metrics
|Technology throughput||7000 hours||8367 hours||8636 hours|
|Capability job completion rate||70%||90%||100%|
|Technology MTBF||100 hours||126.4 hours||732|
Note: Technology throughput is calculated: 12*(732-UDT-SDT); 732 - annual average number of hours in a month
Note: MTBF is calculated as 732/number of failures
|Percentage of non-in-depth |
queries resolved within one day
|Number of SP FTEs||7.3||8.0||8.7||10.7|