Dtype

Todo

This section is under construction as of 1.6.1

Much of the details of dtypes are covered in syntax and in numpydantic.dtype , but this section will specifically address how dtypes are handled both generically and by interfaces as we expand custom dtype handling <3.

For details of support and implementation until the docs have time for some love, please see the tests, which are the source of truth for the functionality of the library for now and forever.

Recall the general syntax:

NDArray[Shape, dtype]

These are the docs for what can do in dtype.

Coercion

Pydantic is designed so that type annotations reflect the type that one can expect a field to be once the object has been instantiated and validated, not the type that can be passed as input.

Accordingly, pydantic attempts to coerce values to their annotated types during validation when it can, rather than raising a validation error.

Numpydantic follows this general pattern when it is feasible, but attempting to coerce the shape or dtype of an array is significantly more costly, and much more likely to result in incorrect data than the scalar data pydantic was designed for.

It tries to balance usability with minimization of surprise - a primary design goal is to be a transparent passthrough validator system that gets out of the way of the underlying array libraries so they can be used as expected.

Numpydantic will

  • Attempt to coerce non-array values like scalars or python sequences to numpy arrays when possible. Validation against shape and dtype happen after coercion.

  • Attempt to coerce any recognized input_types() to the return_type() of the matching array interface (see Interface Matching) (E.g. the VideoInterface recognizes Path and string objects with supported file extensions and coerces them to a VideoProxy arraylike object).

  • Attempt to coerce lists or arrays of dicts to the annotated BaseModel, when the annotation is a pydantic model (see model dtypes)

Numpydantic will not

  • Attempt to reshape an input array

  • Attempt to change the dtype of an array

Scalar Dtypes

Python builtin types and numpy types should be handled transparently, with some exception for complex numbers and objects (described below).

The numpydantic.testing subpackage, and more specifically the numpydantic.testing.cases module are the source of truth for currently supported types and their behavior. Each case is tested combinatorically against each of the shape cases and array interfaces.

Numbers

Specific dtypes can be specified as single dtype classes.

from typing import Any
import numpy as np
from numpydantic import NDArray
from pprint import pprint

UInt8Array = NDArray[Any, np.uint8]
UInt8Array(np.array([1,2,3], dtype=np.uint8))
array([1, 2, 3], dtype=uint8)
UInt8Array(np.array([1,2,3], dtype=np.uint32))

Hide code cell output

---------------------------------------------------------------------------
DtypeError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 UInt8Array(np.array([1,2,3], dtype=np.uint32))

File ~/checkouts/readthedocs.org/user_builds/numpydantic/envs/stable/lib/python3.11/site-packages/numpydantic/ndarray.py:62, in NDArrayMeta.__call__(cls, val)
     60 def __call__(cls, val: NDArrayType) -> NDArrayType:
     61     """Call ndarray as a validator function"""
---> 62     return get_validate_interface(cls.__args__[0], cls.__args__[1])(val)

File ~/checkouts/readthedocs.org/user_builds/numpydantic/envs/stable/lib/python3.11/site-packages/numpydantic/schema.py:276, in get_validate_interface.<locals>.validate_interface(value, info)
    274 interface_cls = Interface.match(value)
    275 interface = interface_cls(shape, dtype)
--> 276 value = interface.validate(value)
    277 return value

File ~/checkouts/readthedocs.org/user_builds/numpydantic/envs/stable/lib/python3.11/site-packages/numpydantic/interface/interface.py:247, in Interface.validate(self, array)
    245 dtype = self.get_dtype(array)
    246 dtype_valid = self.validate_dtype(dtype)
--> 247 self.raise_for_dtype(dtype_valid, dtype)
    248 array = self.after_validate_dtype(array)
    250 shape = self.get_shape(array)

File ~/checkouts/readthedocs.org/user_builds/numpydantic/envs/stable/lib/python3.11/site-packages/numpydantic/interface/interface.py:327, in Interface.raise_for_dtype(self, valid, dtype)
    321 """
    322 After validating, raise an exception if invalid
    323 Raises:
    324     :class:`~numpydantic.exceptions.DtypeError`
    325 """
    326 if not valid:
--> 327     raise DtypeError(f"Invalid dtype! expected {self.dtype}, got {dtype}")

DtypeError: Invalid dtype! expected <class 'numpy.uint8'>, got uint32

numpydantic interprets the builtin float and int a bit differently than numpy - numpy treats float as equivalent to numpy.float64 and int as numpy.int64 when used in array creation.

print(
    np.array([1,2,3], dtype=float).dtype,
    np.array([1,2,3], dtype=int).dtype
)
float64 int64

Numpydantic treats float and int as any float or any integer[1], since it is the parsimonious way to express “any float/integer” when thinking across, rather than within a single array library, and is common need in data standards. numpy.int64 and float64 already have specific dtypes! they are them!

print(
    NDArray[Any, int]([1,2,3]),
    NDArray[Any, int](np.array([1,2,3], dtype=np.uint8)),
    NDArray[Any, int](np.array([1,2,3], dtype=np.int16)),
)
[1 2 3] [1 2 3] [1 2 3]

Numpydantic provides a handful of aliases to numpy dtypes so they appear more “Annotation-like”, and a handful of “generic” types like Float and UnsignedInteger (which are just tuples of numpy dtypes), but in general it is recommended to just use numpy dtypes and type unions directly.

Many array libraries that are not numpy understand numpy dtypes. If you are using an array library that doesn’t, you can use type unions to express whatever combination of types you’d like.

Complex numbers

Todo

Document limitations for complex numbers and strategies for serialization/validation

Datetimes

Todo

Datetimes are supported by every interface except :class:.VideoInterface , with the caveat that HDF5 loses timezone information, and thus all timestamps should be re-encoded to UTC before saving/loading.

More generic datetime support is TODO.

Objects

Generic objects are supported by all interfaces except VideoInterface , HDF5Interface , and ZarrInterface .

this might be expected, but there is also hope.

When the numpy interface validates arrays of objects, it only checks the first item in the array for object identity (type(array.flat()[0]) is dtype), as iterating through every object in an array would be downright silly levels of expensive for default behavior.

PRs welcome for implementing opt-in strict object validation behavior.

Strings

Todo

Strings are supported by all interfaces except :class:.VideoInterface .

TODO is fill in the subtleties of how this works

Model Dtypes

Pydantic models can be used as dtypes, and numpydantic will attempt to cast them to the model class when passed as a dict, e.g. when validating from JSON.

from pydantic import BaseModel

class KimPetras(BaseModel):
    listen: str = "to"
    turn: str = "off"
    the: str = "light"
    n: int = 5000
    times: str = "!"

NDArray[Any, KimPetras]([
    {"listen": "up", "turn": "s out", "the": "album is good"},
    {"n": "10000", "times": "is more like it"},
])
array([KimPetras(listen='up', turn='s out', the='album is good', n=5000, times='!'),
       KimPetras(listen='to', turn='off', the='light', n=10000, times='is more like it')],
      dtype=object)

Models are supported in the same set of array interfaces that

Union Types

Union types can be used as expected.

from typing import Union

print(
  NDArray[Any, np.float16 | np.int32](np.array([1,2,3], dtype=np.float16)),
  NDArray[Any, Union[np.float16, np.int32]](np.array([1,2,3], dtype=np.int32))
)
[1. 2. 3.] [1 2 3]

Since union types are cumbersome to work with, unions can also be expresses as tuples of types

NDArray[Any, (np.uint8, np.uint16)](np.array([1,2,3], dtype=np.uint8))
array([1, 2, 3], dtype=uint8)

Python unnests nested union types automatically, and numpydantic tests against tuple unions recursively – any item at any level can match.

from numpydantic.dtype import SignedInteger
CoolTypes = (np.uint8, np.datetime64)
AlsoCool = (np.str_, SignedInteger)

Faves = (CoolTypes, AlsoCool)
pprint(Faves)
((<class 'numpy.uint8'>, <class 'numpy.datetime64'>),
 (<class 'numpy.str_'>,
  (<class 'numpy.int8'>,
   <class 'numpy.int16'>,
   <class 'numpy.int32'>,
   <class 'numpy.int64'>,
   <class 'numpy.int16'>)))
NDArray[Any, Faves](np.array([1,2,3], dtype=np.int16))
array([1, 2, 3], dtype=int16)

Arbitrary Dtypes

Numpydantic does not support every possible type that python is able to express as a dtype, mostly because array libraries can’t support every possible type. Arrays are usually numbers and strings, and more complex dtypes can be expressed as pydantic models.

Validating elaborated objects like an array of tuple[int, str], or pydantic annotated types with validation functions would involve iterating through every element of an array, except for a small subset of cases like annotated_types.Gt() and other annotations that have common vectorized operations.

PRs are welcome for implementing annotated types support, or for any other elaborated type that isn’t currently supported but is a common or useful dtype :).

Compound Dtypes

Todo

Compound dtypes are currently unsupported, though the HDF5 interface supports indexing into compound dtypes as separable dimensions/arrays using the third “field” parameter in hdf5.H5ArrayPath .