HECToR Monthly Report, April 2011
Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.
This report relates to the Phase 2b service only. Performance statistics for the non-contractual Phase 2a system can be found here.
Dates covered: 08:00 1 April 2011 to 08:00 1 May 2011
Number of hours: 720
1: Availability
Scheduled down time: 17 hours 40 minutes.
Incidents
The following incidents were recorded:
Severity | Number |
1 | 2 |
2 | 0 |
3 | 27 |
4 | 0 |
Of the four severity levels, level 1 corresponds to a contractual failure.
Out of the 18 SEV-3 Incidents, 18 were attributed to single node failure events.
Details of severity level 1 incidents
ID | Date | Description | Length | Attribution |
Incident-6007 | 06/04/2011 | Maintenance session overrun | 00:41 | Cray |
Incident-6071 | 14/04/2011 | Faulty OSS node | 19:46 | Cray |
MTBF and Serviceability
Attribution | Failures | MTBF | UDT | Serviceability |
Cray | 2 | 366 | 20:27 | 97.1% |
Site | 0 | ∞ | 00:00 | 100% |
External | 0 | ∞ | 00:00 | 100% |
Other | 0 | ∞ | 00:00 | 100% |
Overall | 2 | 366 | 20:27 | 97.1% |
- Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
- Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.
Details of single node failures
Error Type | Number |
Software error | 14 |
HT lockup | 3 |
MCA bank 4 error (DIMMs) | 3 |
Heartbeat fault | 2 |
Voltage fault | 2 |
Failure to boot | 2 |
Failed node health check on application exit | 1 |
2: Courses
This information is supplied by NAG LtdTitle of Course | Dates | Available Places | Ordinary Attendees | Paying Attendees | CSE Staff | Total Attending |
Multicore, Imperial College, London | 5 April 2011 | 40 | 15 | 0 | 0 | 15 |
Transitioning to the Cray XE6, NAG Manchester | 6 April 2011 | 12 | 11 | 0 | 0 | 11 |
Coarray Fortran, NAG Manchester | 7 April 2011 | 12 | 6 | 0 | 0 | 6 |
Transitioning to the Cray XE6, NAG Oxford | 13 April 2011 | 12 | 0 | 0 | 0 | 0 |
Coarray Fortran, NAG Manchester | 14 April 2011 | 12 | 4 | 0 | 3 | 7 |
3: Quality Tokens
None set this month.4: Hours Worked
Group | Days worked | FTEs |
USL | 70.9 | 4.0 |
OSG | 73.2 | 4.1 |
5: Performance Metrics
Technology Provision
Description | TSL | FSL | Value |
Technology reliability | 85% | 98.5% | 97.1% |
Technology throughput | 7000 hours | 8367 hours | 8326 hours |
Capability job completion rate | 70% | 90% | 100 % |
Technology MTBF | 100 hours | 126.4 hours | 366 |
Note: Technology throughput is calculated: 12*(732-UDT-SDT), where 732 is the annual average number of hours in a month.
Note: MTBF is calculated as 732/number of failures
Service Provision
Description | TSL | FSL | USL | Value |
Percentage of non-in-depth queries resolved within one day | 85% | 97% | 99% | 99.1% |
Number of SP FTEs | 7.3 | 8.0 | 8.7 | 8.1 |
SP Serviceability | 80% | 99% | 99.5% | 100% |