The HECToR Service is now closed and has been superceded by ARCHER.

HECToR Monthly Report, March 2008

Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.

Dates covered: 08:00 1 March 2008 to 08:00 1 April 2008
Number of hours: 744

1: Availability

Scheduled down time: 11 hours 20 minutes

Incidents

The following incidents were recorded:

SeverityNumber
15
20
320
40

Of the four severity levels, level 1 corresponds to a contractual failure.

Details of severity level 1 incidents

ID Date Description Length Attribution
Incident-143 01/03/2008 Power problem on module takes out HSN 02:15 Cray
Incident-150 10/03/2008 OST nodes fail after "portals" problem 02:38 Cray
Incident-155 13/03/2008 Boot Node Drop 0:12 Cray
Incident-168 26/03/2008 System down after OST 4node failure 03:54 Cray
Incident-171 30/03/2008 Failure of external network 25:45 External

MTBF and Serviceability

AttributionFailuresMTBFUDTServiceability
Cray418308:59:0098.8%
Site0 ~ 00:00:00100%
External1 732 25:45:0096.5%
Other0 ~ 00:00:00100%
Overall514634:44:0095.3%
  • Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
  • Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.

2: Courses

This information is supplied by NAG Ltd

Title of Course Dates Available places Total attending HECToR Users HECToR Staff
3 March 2008 Introduction to HECToR 20 3 2 0
4 - 5 March 2008 Pitfalls of Numerical Engineering 12 0 0 0
6 - 7 March Techniques for Achieving Scalability 12 5 1 3

3: Quality tokens

Mar 25, 2008 4:36:14 PM Dr Lee Margetts * * * * * I had the pleasure of attending a workshop at the University of Durham on multiscale modelling. Guy Robinson was in attendance for the workshop. I am very grateful for Guy's professionalism and patience in answering many questions about the HECToR service
Mar 27, 2008 10:20:43 AM MR Laszlo Oroszlany x x nor the help nor the wiki page is full enough

4: Hours worked

GroupDays workedFTEs
USL 74 4.2
OSG 67.2 3.8

5: Performance metrics

Technology Provision

Description TSL FSL Value
Technology reliability 85% 98.5% 98.8%
Technology throughput 7000 hours 8367 hours 8231 hours
Capability job completion rate 70% 90% 98%
Technology MTBF 100 hours 126.4 hours 183 hours

Note: Technology throughput is calculated: 12*(732-UDT-SDT); 732 - annual average number of hours in a month

Note: MTBF is calculated as 732/number of failures

Service Provision

Description TSL FSL USL Value
Percentage of non-in-depth
queries resolved within one day
85% 97% 99% 99%
Number of SP FTEs 7.3 8.0 8.7 8.0
SP serviceability 80% 99% 99.5% 100%