UK Research Data Facility Guide
The UK Research Data Facility (RDF) is now available to HECToR Users. The facility, funded by EPSRC and NERC, is collocated with HECToR and is housed at the ACF facility. UoE HPCx Ltd manages and contracts the hardware provision.
The Research Councils' vision behind the RDF:
- Provide a high capacity robust file store;
- Persistent infrastructure - will last beyond any one national service;
- Easily extensible in size and number of hosts - degree of future proofing and potential for increasing local post processing activities;
- Operates independently of any one vendor's offering for compute;
- Remotely accessible via an Edinburgh host - not restricted to through login nodes;
- Will remove end of service data issues - transfers at end of services have become increasingly lengthy;
- Will also ensure that data from the current HECToR service is secured - this will ensure a degree of soft landing if there is ever a gap in National Services;
Technical Details
The RDF consists of 7.8PB disk, with an additional 19.5 PB of backup tape capacity.
The disk storage is based on four DDN 10K storage arrays populated with near-line SAS 3TB 72000rpm HDDs. Metadata storage is based on two IBM DS3524s populated with SAS 300GB 10krpm HDDs. The backup capability is managed via Tivoli Storage Manager, based on an IBM TS3500 tape library with 12 drives.
Access to RDF for HECToR users
The Research Councils have already identified a subset of consortia to be configured on the RDF. These are primarily groups with large data needs. PIs and users from these groups have been contacted regarding the quota that has been allocated.
The RDF is designed for long term data storage. If you have a requirement in this area and you are not part of one of the initial groups, please discuss this with your PI in the first instance. The PI should then contact the HECToR Helpdesk for advice.
Connecting to RDF
On HECToR, the RDF will be available to users as a directly mounted filesystem. The name of the filesystem will depend on your funding body. At present three filesystems have been created:
/epsrc /nerc /general
These filesystems are only visible on the HECToR login nodes. Access from the compute nodes is not available. Note: for NERC users the Large Memory Server is also connected directly to the RDF .
Moving Data To/From HECToR
As the RDF is directly mounted on
HECToR, standard commands such as cp
can be used to copy
files across from the /home
and /work
filesystems.
We have found that the native cp
command gives the best performance on transferring data
from HECToR filesystems to the RDF.
The following error message is sometimes seen when moving data to/from the RDF:
mv: setting attributes for
'/nerc/.../.../../':
Operation not supported
This is a harmless warning when moving files from Lustre filesystems (/work) to non-lustre filesystems (/nerc) that can be safely ignored. This message is explaining that it is impossible to set the "Extended Attriubtes" (xattr) that the file has on the /work lustre file system (details specific to this filesystem, like the number and size of lustre stripes) on the NFS filesystem that is /nerc. Your file should still have moved.
One way to avoid this error is to copy the files from /work to /nerc and delete them once they have successfully transferred.
Using the Serial Queues
Users moving large volumes of data via rsync
etc. are recommended to use the serial batch queues. Large transfer jobs running on the login nodes may be terminated.
Moving Data On/Off the RDF to External Sites
As mentioned above in the vision statement, access to the RDF should not be restricted through the
HECToR login nodes. Four 'Data Mover Nodes' have been configured on the RDF. These enable access at
times when HECToR is unavailable such as during maintenance sessions, and will provide direct 10gbit Ethernet
connections to the outside world.
In additional to the normal unix data transfer commands such as scp
the data mover nodes have been configured to use Grid-FTP. This is part of the Globus tool suite and provides a mechanism to efficiently move large volumes of data. Grid-ftp servers are currently available on the following data transfer nodes:
dtn01.hector.ac.uk
dtn02.hector.ac.uk
dtn03.hector.ac.uk
We have found that GridFTP gives the best performance in transferring data from external sites to the RDF.
RDF Quota Management
If you are a PI of a project, quota management for the RDF is available via the SAFE in the same way as
you currently manage /work
and /home
quotas. Please note that user quotas are not
configured on the RDF. These may be added at a later date.
Support
Whilst the RDF is not part of the HECToR service itself, access is currently only configured for HECToR users. Users with support requests relating to the RDF should contact the HECToR Helpdesk as normal. Please note that support of the RDF will be performed on a reasonable endeavours basis.
