The HECToR Service is now closed and has been superceded by ARCHER.

HECToR Monthly Report, November 2011

Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.

Dates covered: 08:00 1 November 2011 to 08:00 1 December 2011
Number of hours: 720

1: Availability

Scheduled down time: 91 hours 11 minutes. This includes extended maintenance required for the Phase 3 upgrade.

Incidents

The following incidents were recorded:

SeverityNumber
13
20
32
40

Of the four severity levels, level 1 corresponds to a contractual failure.

Out of the 2 SEV-3 Incidents, 2 were attributed to node failure events.

Details of severity level 1 incidents

ID Date Description Length Attribution
Incident-6929 14/11/2011 esFS problem 34:49 Cray
Incident-6931 18/11/2011 Loss of cooling water 06:57 Site
Incident-6931 28/11/2011 Plant cooling failure 07:14 Site

MTBF and Serviceability

AttributionFailuresMTBFUDTServiceability
Cray1 732 34:4994.5%
Site2 366 14:1197.7%
External000:00 100.0%
Other/Unknown000:00 100.0%
Overall3 244 49:00 92.2%
  • Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
  • Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.

Details of single node failures

Error Type Number
MCA bank 4 error 2

2: Courses

This information is supplied by NAG Ltd
Title of Course Dates Available Places Ordinary Attendees Paying Attendees CSE Staff Total Attending
Parallel Programming with MPI, King's College London 7 - 9 November 2011 45 21 0 0 21
OpenMP, King's College London 10 - 11 November 2011 45 14 0 5 14
An Introduction to CUDA Programming, NAG Manchester 21 - 22 November 2011 25 13 2 0 15
An Introduction to OpenCL Programming, NAG Manchester 24 - 25 November 2011 25 12 0 0 12

3: Quality Tokens

None set this month.

4: Hours Worked

GroupDays workedFTEs
USL 78.9 4.4
OSG 73.0 4.1

5: Performance Metrics

Technology Provision

Description TSL FSL Value
Technology reliability 85% 98.5% 94.5%
Technology throughput 7000 hours 8367 hours 7101 hours
Capability job completion rate 70% 90% 100%
Technology MTBF 100 hours 126.4 hours 732

Note: Technology throughput is calculated: 12*(732-UDT-SDT), where 732 is the annual average number of hours in a month.

Note: MTBF is calculated as 732/number of failures

Service Provision

Description TSL FSL USL Value
Percentage of non-in-depth
queries resolved within one day
85% 97% 99% 98.2%
Number of SP FTEs 7.3 8.0 8.7 8.5
SP Serviceability 80% 99% 99.5% 97.7%