HDF5

Interfaces for HDF5 Datasets

Note

HDF5 arrays are accessed through a proxy class H5Proxy . Getting/setting values should work as normal, except that setting values on nested views is impossible -

Specifically this doesn’t work:

my_model.array[0][0] = 1

But this does work:

my_model.array[0,0] = 1

To have direct access to the hdf5 dataset, use the H5Proxy.open() method.

Datetimes

Datetimes are supported as a dtype annotation, but currently they must be stored as S32 isoformatted byte strings (timezones optional) like:

import h5py
from datetime import datetime
import numpy as np
data = np.array([datetime.now().isoformat().encode('utf-8')], dtype="S32")
h5f = h5py.File('test.hdf5', 'w')
h5f.create_dataset('data', data=data)
class H5ArrayPath(file: Path | str, path: str, field: str | list[str] | None = None)[source]

Location specifier for arrays within an HDF5 file

Create new instance of H5ArrayPath(file, path, field)

file: Path | str

Location of HDF5 file

path: str

Path within the HDF5 file

field: str | list[str] | None

Refer to a specific field within a compound dtype

pydantic model H5JsonDict[source]

Round-trip Json-able version of an HDF5 dataset

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Fields:
field field: str | None = None
field file: str [Required]
field path: str [Required]
to_array_input() H5ArrayPath[source]

Construct an H5ArrayPath

class H5Proxy(file: Path | str, path: str, field: str | list[str] | None = None, annotation_dtype: str | type | Any | generic | None = None)[source]

Proxy class to mimic numpy-like array behavior with an HDF5 array

The attribute and item access methods only open the file for the duration of the method, making it less perilous to share this object between threads and processes.

This class attempts to be a passthrough class to a h5py.Dataset object, including its attributes and item getters/setters.

When using read-only methods, no locking is attempted (beyond the HDF5 defaults), but when using the write methods (setting an array value), try and use the locking methods of h5py.File .

Parameters:
  • file (pathlib.Path | str) – Location of hdf5 file on filesystem

  • path (str) – Path to array within hdf5 file

  • field (str, list[str]) – Optional - refer to a specific field within a compound dtype

  • annotation_dtype (dtype) – Optional - the dtype of our type annotation

array_exists() bool[source]

Check that there is in fact an array at path within file

classmethod from_h5array(h5array: H5ArrayPath) H5Proxy[source]

Instantiate using H5ArrayPath

property dtype: dtype

Get dtype of array, using field if present

__array__() ndarray[source]

To a numpy array

__len__() int[source]

self.shape[0]

__eq__(other: H5Proxy) bool[source]

Check that we are referring to the same hdf5 array

open(mode: str = 'r') Dataset[source]

Return the opened h5py.Dataset object

You must remember to close the associated file with close()

close() None[source]

Close the h5py.File object left open when returning the dataset with open()

class H5Interface(shape: tuple[int, ...] | Any = typing.Any, dtype: str | type | Any | generic = typing.Any)[source]

Interface for Arrays stored as datasets within an HDF5 file.

Takes a H5ArrayPath specifier to select a h5py.Dataset from a h5py.File and returns a H5Proxy class that acts like a passthrough numpy-like interface to the dataset.

return_type

alias of H5Proxy

json_model

alias of H5JsonDict

classmethod enabled() bool[source]

Check whether h5py can be imported

classmethod check(array: H5ArrayPath | tuple[Path | str, str]) bool[source]

Check that the given array is a H5ArrayPath or something that resembles one.

before_validation(array: Any) NDArrayType[source]

Create an H5Proxy to use throughout validation

get_dtype(array: NDArrayType) str | type | Any | generic[source]

Get the dtype from the input array

Subclasses to correctly handle

classmethod to_json(array: H5Proxy, info: SerializationInfo | None = None) dict[source]

Render HDF5 array as JSON

If round_trip == True, we dump just the proxy info, a dictionary like:

  • file: file

  • path: path

  • attrs: Any HDF5 attributes on the dataset

  • array: The array as a list of lists

Otherwise, we dump the array as a list of lists