The HECToR Service is now closed and has been superceded by ARCHER.

HECToR Monthly Report, February 2008

Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.

Dates covered: 08:00 1 February 2008 to 08:00 1 March 2008
Number of hours: 696

1: Availability

Scheduled down time: 21 hours 48 minutes

Incidents

The following incidents were recorded:

SeverityNumber
17
20
326
42

Of the four severity levels, level 1 corresponds to a contractual failure.

Details of severity level 1 incidents

ID Date Description Length Attribution
Incident-115 06/02/2008 Inactive link between c18-0c0s4s3 and c16-0c0s4s3 1:30:00 Cray
Incident-120 12/02/2008 HSN collapse after link inactive failure 1:18:00 Cray
Incident-125 16/02/2008 System reloaded due to nodes out of memory 2:53:00 Cray
Incident-126 17/02/2008 Reload required due to Lustre problems 0:58:00 Cray
Incident-130 23/02/2008 OST nodes fail after Lustre errors 2:46:00 Cray
Incident-131 24/02/2008 OST/MDS nodes fail after Lustre errors 1:59:00 Cray
Incident-142 29/02/2008 OST nodes fail after Lustre errors 3:11:00 Cray

MTBF and Serviceability

AttributionFailuresMTBFUDTServiceability
Cray710514:35:0097.8%
Site0 - 01:03:00100%
External0 - 00:00:00100%
Other0 - 00:00:00100%
Overall710511:03:0097.8%
  • Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
  • Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.

2: Courses

This information is supplied by NAG Ltd

Title of Course Dates Available places Total attending HECToR Users HECToR Staff
18 Feb Introduction to HECToR 20 14 4 1
19-20 Feb Tools and Techniques 12 10 5 1
21-22 Feb Testing and Benchmarking 12 8 4 1
27-28 Feb Scientific Visualisation 12 4 1 3

3: Quality tokens

Feb 4, 2008 3:05:04 PM Mr Anthony J Devey * * * * *  

4: Hours worked

GroupDays workedFTEs
USL 71.0 4.0
OSG 69.3 3.9

5: Performance metrics

Technology Provision

Description TSL FSL Value
Technology reliability 85% 98.5% 97.8%
Technology throughput 7000 hours 8367 hours 8347 hours
Capability job completion rate 70% 90% 97%
Technology MTBF 100 hours 126.4 hours 105 hours

Note: Technology throughput is calculated: 12*(732-UDT-SDT); 732 - annual average number of hours in a month

Note: MTBF is calculated as 732/number of failures

Service Provision

Description TSL FSL USL Value
Percentage of non-in-depth
queries resolved within one day
85% 97% 99% 100%
Number of SP FTEs 7.3 8.0 8.7 7.9
SP serviceability 80% 99% 99.5% 100%