HECToR Monthly Report, October 2011
Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.
Dates covered: 08:00 1 October 2011 to 08:00 1 November 2011
Number of hours: 720
1: Availability
Scheduled down time: 14 hours 4 minutes.
Incidents
The following incidents were recorded:
Severity | Number |
1 | 0 |
2 | 0 |
3 | 10 |
4 | 0 |
Of the four severity levels, level 1 corresponds to a contractual failure.
Out of the 10 SEV-3 Incidents, 10 were attributed to node failure events. Two of the instances were nodes being admin downed by systems following failed health check and no user jobs were impacted.
Details of severity level 1 incidents
None this month.MTBF and Serviceability
Attribution | Failures | MTBF | UDT | Serviceability |
Cray | 0 | ∞ | 00:00 | 100.0% |
Site | 0 | ∞ | 00:00 | 100.0% |
External | 0 | ∞ | 00:00 | 100.0% |
Other/Unknown | 0 | ∞ | 00:00 | 100.0% |
Overall | 0 | ∞ | 00:00 | 100.0% |
- Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
- Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.
Details of single node failures
Error Type | Number |
Software problem (related to a user application, bug #775153) | 2 |
MCA bank 1/4 error | 1 |
Voltage fault | 2 |
HT lockup error | 2 |
Admin down following failed health check | 2 |
Failed 'xtbounce' during PM | 1 |
2: Courses
This information is supplied by NAG LtdTitle of Course | Dates | Available Places | Ordinary Attendees | Paying Attendees | CSE Staff | Total Attending |
Debugging, Profiling and Optimising, University of Reading | 7 - 8 October 2011 | 20 | 15 | 0 | 0 | 15 |
Object Oriented Programming in Fortran 2003, NAG Manchester | 11 - 13 October 2011 | 25 | 11 | 0 | 5 | 16 |
3: Quality Tokens
Date | Tokens Awarded | Comment | Consortium |
19 Oct 2011 | * * * * * | Positive tokens - no user comments received | x01 |
24 Oct 2011 | * * * * * | Positive tokens - no user comments received | e05 |
26 Oct 2011 | * * * * * | Positive tokens - no user comments received | x01 |
4: Hours Worked
Group | Days worked | FTEs |
USL | 83.9 | 4.7 |
OSG | 73.2 | 4.1 |
5: Performance Metrics
Technology Provision
Description | TSL | FSL | Value |
Technology reliability | 85% | 98.5% | 100% |
Technology throughput | 7000 hours | 8367 hours | 8615 hours |
Capability job completion rate | 70% | 90% | 100% |
Technology MTBF | 100 hours | 126.4 hours | ∞ |
Note: Technology throughput is calculated: 12*(732-UDT-SDT), where 732 is the annual average number of hours in a month.
Note: MTBF is calculated as 732/number of failures
Service Provision
Description | TSL | FSL | USL | Value |
Percentage of non-in-depth queries resolved within one day | 85% | 97% | 99% | 98.7% |
Number of SP FTEs | 7.3 | 8.0 | 8.7 | 8.8 |
SP Serviceability | 80% | 99% | 99.5% | 100% |