Next: HDF5 Filters Up: NEMO on HECToR A Previous: NEMO output comparison scripts Contents

Some notes on HDF5 datasets

HDF5 is a set of tools and libraries that allows extremely large and complicated data collections to be managed. The file format used by HDF5 is designed to be portable. Further information on HDF5 can be found at [8].

An HDF5 dataset is an object comprised of a collection of data elements and metadata. In addition the dataset may have optional attribute objects.

Data elements - one dimensional or multi-dimensional arrays. Can be specified types (integer, real, double, char) or compound type (c.f. C like structs)
Metadata - describes the data elements, data layout and all information necessary to read/write (e.g. chunking/compression used) and interpret the data.
Attributes - optional, meta data object which can be used to describe the nature and/or intended use of a data set.

When an HDF5 dataset is created a number of properties of the dataset are set:

name - name of dataset usinb alphanumeric ASCII characters
dataspace - defines the number of dimensions, the current extent in each dimension and the maximum allowed extent in each dimension.
datatype - a dataset has a datatype associated with it which describes the layout of the raw data in the file. The file datatype is set when the dataset is created and cannot be changed.
storage properties - control how the data is stored and whether any chunking or compression is used. The storage properties are set when the dataset is created and cannot change.

Most of these dataset properties are permanent, they cannot be changed during the lifetime of the dataset. The key exception is the, dataspace which can be expanded up to its maximum dimensions.

Data Transfer - e.g. how does the data get from the application to a physical file? Essentially the HDF5 library implements data transfers through a pipeline which includes:

Data transformations
Chunking
I/O operations
optional filters, e.g. compression, can also be added to the pipeline

Storage allocation in the file, early, incremental, late - may need consideration for parallel I/O.

Subsections

HDF5 Filters

Next: HDF5 Filters Up: NEMO on HECToR A Previous: NEMO output comparison scripts Contents