Next: HDF5 Filters
Up: NEMO on HECToR A
Previous: NEMO output comparison scripts
Contents
Some notes on HDF5 datasets
HDF5 is a set of tools and libraries that allows extremely large and
complicated data collections to be managed. The file format used by HDF5 is
designed to be portable. Further information on HDF5 can be found at
[8].
An HDF5 dataset is an object comprised of a collection of data elements and
metadata. In addition the dataset may have optional attribute objects.
- Data elements - one dimensional or multi-dimensional arrays. Can be
specified types (integer, real, double, char) or compound type (c.f. C
like structs)
- Metadata - describes the data elements, data layout and all information
necessary to read/write (e.g. chunking/compression used) and interpret the
data.
- Attributes - optional, meta data object which can be used to describe
the nature and/or intended use of a data set.
When an HDF5 dataset is created a number of properties of the dataset are set:
- name - name of dataset usinb alphanumeric ASCII characters
- dataspace - defines the number of dimensions, the current
extent in each dimension and the maximum allowed extent in each dimension.
- datatype - a dataset has a datatype associated with it which
describes the layout of the raw data in the file. The file datatype is set
when the dataset is created and cannot be changed.
- storage properties - control how the data is stored and
whether any chunking or compression is used. The storage properties are
set when the dataset is created and cannot change.
Most of these dataset properties are permanent, they cannot be changed
during the lifetime of the dataset. The key exception is the, dataspace
which can be expanded up to its maximum dimensions.
Data Transfer - e.g. how does the data get from the application to a
physical file? Essentially the HDF5 library implements data transfers
through a pipeline which includes:
- Data transformations
- Chunking
- I/O operations
- optional filters, e.g. compression, can also be added to the pipeline
Storage allocation in the file, early, incremental, late - may need
consideration for parallel I/O.
Subsections
Next: HDF5 Filters
Up: NEMO on HECToR A
Previous: NEMO output comparison scripts
Contents