The HECToR Service is now closed and has been superceded by ARCHER.

HECToR Monthly Report, April 2008

Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.

Dates covered: 08:00 1 April 2008 to 08:00 1 May 2008
Number of hours: 720

1: Availability

Scheduled down time: 8 hours 51 minutes

Incidents

The following incidents were recorded:

SeverityNumber
16
22
329
41

Of the four severity levels, level 1 corresponds to a contractual failure.

Details of severity level 1 incidents

ID Date Description Length Attribution
Incident-180 04/04/2008 Service Node hector03 c0-0c0s2n0 failed 05:58 Cray
Incident-183 07/04/2008 OST7 failure causing Lustre collapse 07:58 Cray
Incident-190 10/04/2008 Service failure due to "portals" problem 11:24 Cray
Incident-191 12/04/2008 IO module c2-1c0s6 failure disrupts HSN 01:46 Cray
Incident-207 26/04/2008 Main PDU failure in cab c0-4 10:59 Cray
Incident-211 28/04/2008 Service close after "lustre" errors 01:16 Cray

MTBF and Serviceability

AttributionFailuresMTBFUDTServiceability
Cray612239:21:0094.6%
Site0 ~ 00:00:00100%
External0 ~ 00:00:00100%
Other0 ~ 00:00:00100%
Overall612239:21:0094.6%
  • Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
  • Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.

2: Courses

This information is supplied by NAG Ltd

3: Quality tokens

Apr 18, 2008 1:07:59 PM MR Laszlo Oroszlany x x x could not get pathscale working with help of manual/wiki
Apr 5, 2008 10:46:14 AM Dr George N Barakos * * * *  
Apr 1, 2008 12:57:50 PM MR Laszlo Oroszlany x  

4: Hours worked

GroupDays workedFTEs
USL 83.4 4.4
OSG 64.5 3.6

5: Performance metrics

Technology Provision

Description TSL FSL Value
Technology reliability 85% 98.5% 94.6%
Technology throughput 7000 hours 8367 hours 8206 hours
Capability job completion rate 70% 90% 95%
Technology MTBF 100 hours 126.4 hours 122 hours

Note: Technology throughput is calculated: 12*(732-UDT-SDT); 732 - annual average number of hours in a month

Note: MTBF is calculated as 732/number of failures

Service Provision

Description TSL FSL USL Value
Percentage of non-in-depth
queries resolved within one day
85% 97% 99% 100%
Number of SP FTEs 7.3 8.0 8.7 8.3
SP serviceability 80% 99% 99.5% 100%