The HECToR Service is now closed and has been superceded by ARCHER.

UK Research Data Facility Guide

The UK Research Data Facility (RDF) is now available to HECToR Users. The facility, funded by EPSRC and NERC, is collocated with HECToR and is housed at the ACF facility. UoE HPCx Ltd manages and contracts the hardware provision.

The Research Councils' vision behind the RDF:

  • Provide a high capacity robust file store;
  • Persistent infrastructure - will last beyond any one national service;
  • Easily extensible in size and number of hosts - degree of future proofing and potential for increasing local post processing activities;
  • Operates independently of any one vendor's offering for compute;
  • Remotely accessible via an Edinburgh host - not restricted to through login nodes;
  • Will remove end of service data issues - transfers at end of services have become increasingly lengthy;
  • Will also ensure that data from the current HECToR service is secured - this will ensure a degree of soft landing if there is ever a gap in National Services;

Technical Details

The RDF consists of 7.8PB disk, with an additional 19.5 PB of backup tape capacity.

The disk storage is based on four DDN 10K storage arrays populated with near-line SAS 3TB 72000rpm HDDs. Metadata storage is based on two IBM DS3524s populated with SAS 300GB 10krpm HDDs. The backup capability is managed via Tivoli Storage Manager, based on an IBM TS3500 tape library with 12 drives.

Access to RDF for HECToR users

The Research Councils have already identified a subset of consortia to be configured on the RDF. These are primarily groups with large data needs. PIs and users from these groups have been contacted regarding the quota that has been allocated.

The RDF is designed for long term data storage. If you have a requirement in this area and you are not part of one of the initial groups, please discuss this with your PI in the first instance. The PI should then contact the HECToR Helpdesk for advice.

Connecting to RDF

On HECToR, the RDF will be available to users as a directly mounted filesystem. The name of the filesystem will depend on your funding body. At present three filesystems have been created:

/epsrc
/nerc
/general

These filesystems are only visible on the HECToR login nodes. Access from the compute nodes is not available. Note: for NERC users the Large Memory Server is also connected directly to the RDF .

Moving Data To/From HECToR

As the RDF is directly mounted on HECToR, standard commands such as cp can be used to copy files across from the /home and /work filesystems.

We have found that the native cp command gives the best performance on transferring data from HECToR filesystems to the RDF.

The following error message is sometimes seen when moving data to/from the RDF:

mv: setting attributes for '/nerc/.../.../../':
Operation not supported

This is a harmless warning when moving files from Lustre filesystems (/work) to non-lustre filesystems (/nerc) that can be safely ignored. This message is explaining that it is impossible to set the "Extended Attriubtes" (xattr) that the file has on the /work lustre file system (details specific to this filesystem, like the number and size of lustre stripes) on the NFS filesystem that is /nerc. Your file should still have moved.

One way to avoid this error is to copy the files from /work to /nerc and delete them once they have successfully transferred.

Using the Serial Queues

Users moving large volumes of data via rsync etc. are recommended to use the serial batch queues. Large transfer jobs running on the login nodes may be terminated.

Moving Data On/Off the RDF to External Sites

As mentioned above in the vision statement, access to the RDF should not be restricted through the HECToR login nodes. Four 'Data Mover Nodes' have been configured on the RDF. These enable access at times when HECToR is unavailable such as during maintenance sessions, and will provide direct 10gbit Ethernet connections to the outside world. In additional to the normal unix data transfer commands such as scp the data mover nodes have been configured to use Grid-FTP. This is part of the Globus tool suite and provides a mechanism to efficiently move large volumes of data. Grid-ftp servers are currently available on the following data transfer nodes:

  • dtn01.hector.ac.uk
  • dtn02.hector.ac.uk
  • dtn03.hector.ac.uk
Normally you require a grid certificate to use Grid-FTP however the data transfer nodes are also configured to use the sshftp mode of Grid-FTP which uses ssh to set-up the Grid-FTP session.

We have found that GridFTP gives the best performance in transferring data from external sites to the RDF.

RDF Quota Management

If you are a PI of a project, quota management for the RDF is available via the SAFE in the same way as you currently manage /work and /home quotas. Please note that user quotas are not configured on the RDF. These may be added at a later date.

Support

Whilst the RDF is not part of the HECToR service itself, access is currently only configured for HECToR users. Users with support requests relating to the RDF should contact the HECToR Helpdesk as normal. Please note that support of the RDF will be performed on a reasonable endeavours basis.