Planned Maintenance Sessions

Up-coming Planned Maintenance/At Risk sessions are planned for:

  • Emergency Maintenance: 5 February 2014, 09:00 - 18:00 (*End time estimated*)
  • At Risk Maintenance: 19 February 2014, 12:00 - 18:00 (6 hours)
  • At Risk Maintenance: 5 March 2014, 12:00 - 18:00 (6 hours)

The RDF is undergoing a major upgrade in 1Q14: This will involve two extended periods of downtime:

  • RDF Unavailable (Server Room Move) : 27 January - 7 February
  • RDF Unavailable (Capacity Upgrade) : 24 February - 3 March

We try to keep downtime to a minimum and sometimes maintenance sessions will be replaced by At Risk periods (if the list below shows Planned Maintenance/At Risk then it has not yet been decided which will be used). During an At Risk period the service will still be open to users but work will be undertaken. There is a small, but finite, chance that the work being undertaken will affect normal operation and users should bear this in mind when using the service during these times. We will endevour to notify users of the change from full planned maintenance to At Risk period as early as possible to help them plan their work accordingly.

Notifications of Maintenance and At Risk Sessions

If you wish to be kept notified of upcoming maintenance sessions and when the machine is returned to service, you can subscribe to additional e-mail notifications by using the procedure described in the HECToR FAQ.

Effect on User Jobs

Prior to maintenance sessions the queues on HECToR are drained to ensure that no jobs are running when the system is shut down. The exception to this are the low-priority queues where it is the user's responsibility to ensure that any jobs have completed prior to a shutdown. If you have large, short jobs to run you may find that you get good turnaround prior to a maintenance session when longer jobs are being held.

Work Undertaken During Maintenance Sessions

Regular maintenance sessions are used to ensure that:

  • software versions are kept up to date;
  • firmware levels on Cray and third-party peripheral equipment are kept up to date;
  • essential security patches are applied;
  • failed/suspect hardware can be replaced;
  • new software can be installed;
  • periodic essential maintenance on Cray's electrical and mechanical support equipment (refrigeration systems, air blowers and power distribution units) can be undertaken safely.

Additional maintenance sessions can be scheduled for major hardware or software updates; major upgrades to facility plant and infrastructure; acceptance testing following major service upgrades and statutory electrical testing.

For more detailed information on maintenance policy and upcoming sessions please see the HECToR User Wiki (you will need your SAFE login details to access the wiki).