The HECToR Service is now closed and has been superceded by ARCHER.

HECToR Monthly Report, Phase 2B system, September 2010

Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.

Dates covered: 08:00 1 September 2010 to 08:00 1 October 2010
Number of hours: 744

1: Availability

Scheduled down time: 17 hours 06 minutes.

Incidents

The following incidents were recorded:

SeverityNumber
16
24
317
40

Details of severity level 1 incidents

ID Date Description Length Attribution
Incident-4511 14/09/2010 OSS's re-mounted read-only makes lustre unusable 09:12 Cray
Incident-4536 19/09/2010 PBS server failed 08:04 Cray
Incident-4561 20/09/2010 Voltage fault caused system lockup needing reboot 00:51 Cray
Incident-4566 20/09/2010 PBS server failed 00:57 Cray
Incident-4746 29/09/2010 Lustre problems after 3.1 upgrade 12:11 Cray
Incident-4751 30/09/2010 Lustre collapsed after start of service 09:23 Cray

MTBF and Serviceability

AttributionFailuresMTBFUDTServiceability
Cray612240:38:0094.2%
Site000:00:00100%
External000:00:00100%
Other000:00:00100%
Overall612240:38:0094.2%
  • Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
  • Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.

2: Performance Statistics

Technology Provision

Description Value
Technology reliability 94.2%
Technology throughput 8084 hours
Capability job completion rate 100%
Technology MTBF 122

Note: Technology throughput is calculated: 12*(732-UDT-SDT); 732 - annual average number of hours in a month

Note: MTBF is calculated as 732/number of failures