HECToR Monthly Report, March 2008

Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.

Dates covered: 08:00 1 March 2008 to 08:00 1 April 2008
Number of hours: 744

1: Availability

Scheduled down time: 11 hours 20 minutes

Incidents

The following incidents were recorded:

SeverityNumber
15
20
320
40

Of the four severity levels, level 1 corresponds to a contractual failure.

Details of severity level 1 incidents

ID Date Description Length Attribution
Incident-143 01/03/2008 Power problem on module takes out HSN 02:15 Cray
Incident-150 10/03/2008 OST nodes fail after "portals" problem 02:38 Cray
Incident-155 13/03/2008 Boot Node Drop 0:12 Cray
Incident-168 26/03/2008 System down after OST 4node failure 03:54 Cray
Incident-171 30/03/2008 Failure of external network 25:45 External

MTBF and Serviceability

AttributionFailuresMTBFUDTServiceability
Cray418308:59:0098.8%
Site0 ~ 00:00:00100%
External1 732 25:45:0096.5%
Other0 ~ 00:00:00100%
Overall514634:44:0095.3%
  • Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
  • Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.

2: Courses

This information is supplied by NAG Ltd

Title of Course Dates Available places Total attending HECToR Users HECToR Staff
3 March 2008 Introduction to HECToR 20 3 2 0
4 - 5 March 2008 Pitfalls of Numerical Engineering 12 0 0 0
6 - 7 March Techniques for Achieving Scalability 12 5 1 3

3: Quality tokens

Mar 25, 2008 4:36:14 PM Dr Lee Margetts * * * * * I had the pleasure of attending a workshop at the University of Durham on multiscale modelling. Guy Robinson was in attendance for the workshop. I am very grateful for Guy's professionalism and patience in answering many questions about the HECToR service
Mar 27, 2008 10:20:43 AM MR Laszlo Oroszlany x x nor the help nor the wiki page is full enough

4: Hours worked

GroupDays workedFTEs
USL 74 4.2
OSG 67.2 3.8

5: Performance metrics

Technology Provision

Description TSL FSL Value
Technology reliability 85% 98.5% 98.8%
Technology throughput 7000 hours 8367 hours 8231 hours
Capability job completion rate 70% 90% 98%
Technology MTBF 100 hours 126.4 hours 183 hours

Note: Technology throughput is calculated: 12*(732-UDT-SDT); 732 - annual average number of hours in a month

Note: MTBF is calculated as 732/number of failures

Service Provision

Description TSL FSL USL Value
Percentage of non-in-depth
queries resolved within one day
85% 97% 99% 99%
Number of SP FTEs 7.3 8.0 8.7 8.0
SP serviceability 80% 99% 99.5% 100%