The HECToR Service is now closed and has been superceded by ARCHER.

Big data storage to serve UK researchers

A new storage environment providing 7.8PB of storage and an additional 19.5PB of backup capability is to improve long-term data storage for hundreds of UK users of the HECToR (High-End Computing Terascale Resource) supercomputer. HECToR is hosted by EPCC at the University of Edinburgh and funded by the Engineering and Physical Sciences Research Council (EPSRC), and the Natural Environment Research Council (NERC).

The additional storage compliments HECToR’s existing 1 Petabyte of disk space. Although tightly integrated with HECToR, the new storage environment is built independently and - because it is designed to out-live HECToR - will be available for use with successive supercomputers.

The storage environment was designed and built by data processing, data management and storage provider OCF plc. It uses storage hardware from DataDirect Networks (DDN), and archive hardware and file management software from IBM.

"We needed a more a data-centric view of high performance computing," says Professor Arthur Trew, University of Edinburgh. "Data persists beyond any computer, including HECToR, so we’re prioritising data storage, management and analysis. Doing this enables us to upgrade HECToR and integrate its successor without fear of impacting access to research data. Our expectation is that any future computer must be able to integrate seamlessly with our storage."

Scientists currently store highly complex simulations on site at Edinburgh – file sizes vary from user to user, but each can potentially be gigabytes in size. The passage of data for further interrogation is unique to each researcher and may involve transferring the data to other data repositories off site, moving data to different parts of the country or simply "taking it home" using portable media.

Julian Fielden, OCF managing director, says: "There is lots of talk and consensus at the moment that the problem with big data isn’t really the capacity to store it, but how to access, use and find the data and, in doing so, make it into useful information. The collective investment of the research councils is cleverly helping to avoid this problem by making storage independent of the machine that generated it. Combined with good network access and IBM’s parallel file system GPFS, the data becomes easy to locate and use by any researcher irrespective of location."

"As we enter the big data era, organisations in every field of endeavour are addressing the World’s most pressing scientific and medical questions – questions that would have been too complex to address just a few years ago," says Bill Cox, DDN Vice President of Worldwide Channel Sales. "EPCC and its partner organisations have built a technologically advanced, state-of-the-art facility at the University of Edinburgh that opens a world of possibility to researchers across the UK. DDN is very pleased to join with OCF in assisting on this important project."

The storage environment built by OCF now uses:

  • DDN Storage Fusion Architecture (SFA) 10K-X, a leading integrated storage appliance that maximises application performance while minimising total cost of ownership for big data, cloud, and content-intensive environments. The SFA 10K-X provides 7.8 Petabytes of useable storage capacity.
  • The IBM System Storage TS3500 Tape Library (TS3500 tape library) which is designed to provide a highly scalable, automated tape library for mainframe and open systems backup and archive in midrange to enterprise environments. The TS3500 storage library provides 19.5 Petabytes of capacity.
  • IBM GPFS software to enable
    • seamless storage capacity expansion to handle the explosive growth of big data and digital information;
    • Improved efficiency through enterprise wide, interdepartmental file sharing;
    • Proven commercial-grade reliability to eliminate production outages and eases information life cycle management with policy-driven automation;
    • Cost-effective disaster recovery and business continuity;
    • Active File Management to enable asynchronous access and control of local and remote files.