HECToR Monthly Report, September 2009
Information on the utilisation, disk allocations, slowdowns and helpdesk statistics can be found in the associated SAFE monthly report.
Dates covered: 08:00 1 September 2009 to 08:00 1 October 2009
Number of hours: 720
Scheduled down time: 11 hours 30 minutes.
The following incidents were recorded:
Of the four severity levels, level 1 corresponds to a contractual failure.
Out of the 21 SEV-3 Incidents, 21 were attributed to single node failures.
Details of severity level 1 incidents
|Incident-2232||07/09/2009||Close for security alert||28:05||External*|
|Incident-2262||11/09/2009||High speed network failure||02:19||Cray|
|Incident-2297||23/08/2009||Link Inactive error||02:45||Cray|
|Incident-2367||28/09/2009||Services unavailable - filesystem full||02:24||OSG|
|Incident-2392||30/09/2009||Shutdown due to risk of loss of cooling||01:45||OSG/External**|
* An external unix security alert resulted in the service being closed. This was a preventative measure and there was no breach of HECToR security.
** A fault on the external power distribution network triggered a plant fault.
MTBF and Serviceability
- Note 1: Serviceability%= 100*(WCT-SDT-UDT)/(WCT-SDT)
- Note 2: MTBF (Mean Time Between Failures) is defined as 732/Number of failures.
Details of single node failures
|UME bk0/bk4 (Dimms)||6|
|UME on MDC2 (X2 node failure)||1|
|RX message header CRC Error||11|
|MCA bk0 error (internal Opteron)||1|
2: CoursesThis information is supplied by NAG Ltd
|Title of Course||Dates||Available Places||Ordinary Attendees||Paying Attendees||CSE Staff||Total Attending|
|Fortran 95, University of Exeter||7 - 9 September 2009||30||11||1||0||12|
|Parallel Programming with MPI, University of Exeter||14 - 16 September 2009||30||24||1||0||25|
|Introduction to HECToR, University of Exeter||21 September 2009||30||5||0||0||5|
|OpenMP and Mixed Mode Programming, University of Exeter||22 - 23 September 2009||30||17||0||0||17|
|Core Algorithms for High Performance Scientific Computing, University of Warwick||28 September - 2 October 2009||30||29||0||0||29|
3: Quality Tokens
|30-Sep-2009 21:56:26||* * * * *||Positive feedback from e63 consortium|
4: Hours Worked
5: Performance Metrics
|Technology throughput||7000 hours||8367 hours||8198 hours|
|Capability job completion rate||70%||90%||100%|
|Technology MTBF||100 hours||126.4 hours||366 hours|
Note: Technology throughput is calculated: 12*(732-UDT-SDT); 732 - annual average number of hours in a month
Note: MTBF is calculated as 732/number of failures
|Percentage of non-in-depth |
queries resolved within one day
|Number of SP FTEs||7.3||8.0||8.7||9.3|