Syntax

General form:

field: NDArray[Shape["{shape_expression}"], dtype]

Dtype

Dtype checking is for the most part as simple as an isinstance check - the dtype attribute of the array is checked against the dtype provided in the NDArray annotation. Both numpy and builtin python types can be used.

A tuple of types can also be passed:

field: NDArray[Shape["2, 3"], (np.int8, np.uint8)]

Like nptyping, the dtype module provides convenient access and aliases to the common dtypes, but also provides “generic” dtypes like Float that is a tuple of all subclasses of numpy.floating. Numpy interprets float as being equivalent to numpy.float64, and numpy.floating is an abstract parent class, so “generic” tuple dtypes fill that narrow gap.

Todo

Future versions will support interfaces providing type maps for declaring equality between dtypes that may be specific to that library but should be considered equivalent to numpy or other library’s dtypes.

Todo

Future versions will also support declaring minimum or maximum precisions, so one might say “at least a 16-bit float” and also accept a 32-bit float.

Shape

Shape Forms

The individual constraints for a shape (below) can be expressed in several forms. This is for typechecking compatibility and historical reasons.

Our goal is to converge on a type syntax that is close to numpy’s:

import numpy as np
from typing import Any, TypeVar

_T_Shape = TypeVar("_T_Shape", bound=tuple[Any, ...], default=tuple)
_T_Dtype = TypeVar("_T_Dtype", bound=np.generic, default=Any)

NDArray = np.ndarray[_T_Shape, np.dtype[_T_Dtype]]

which just treats shape as a tuple (usually tuple[int, ...]), and the dtype argument as a subscript of np.dtype.

All the type forms below are valid at runtime, but only the tuple form will pass static typechecking without the mypy plugin.

Tuple Form (Preferred in >=v2)

In >v2.0, Shape will become an alias for tuple.

This form is somewhat in flux as we get it nailed down, as certain typing constructs like ellipses and ranges are challenging to specify or have nonideal default behavior.

The technically correct, but extremely annoying tuple form uses Literal values for every argument:

from typing import Literal as L
from numpydantic import NDArray, Shape

# these are equivalent
NDArray[Shape[L[1], L["2-3"], L["*"], L["..."]]]
NDArray[tuple[L[1], L["2-3"], L["*"], ...]]
NDArray[tuple[L[1], L["2-3"], int, ...]]

Mypy, via the plugin, will support typechecking a more reasonable form:

NDArray[Shape[1, "2-3", "*", "..."]]
NDArray[tuple[1, "2-3", "*", ...]]
NDArray[tuple[1, "2-3", int, ...]]

and we will explore additional refinements as needed.

String Form (nptyping)

The pure string form is inherited from nptyping. Its use is discouraged in new code: it will be deprecated in v2.0 and removed in v3.0.

The string form is syntactically invalid to the python type system, and is less inspectable than the tuple form.

# these are equivalent
NDArray[Shape["1, 2-3, *, ..."]]
NDArray[Shape[Literal["1, 2-3, *, ..."]]]

Functional Form

The functional form should only be used within NDArraySchema() or when it is otherwise the only form that satisfies static type checkers.

Annotated[np.ndarray, NDArraySchema(Shape(1, "2-3", "*", "..."))]

Shape Args

Full documentation of nptyping’s shape syntax is available in the nptyping docs, but for the sake of self-contained docs, the high points are:

Numerical Shape

A comma-separated list of integers.

For a 2-dimensional, 3 x 4-shaped array:

Shape["3, 4"]

Wildcards

Wildcards indicate a dimension can be any size

For a 2-dimensional, 3 x any-shaped array:

Shape["3, *"]

Ranges

Dimension sizes can also be specified as ranges[1]. Ranges must have no whitespace, and may use integers or wildcards. Range specifications are inclusive on both ends.

For an array whose…

  • First dimension can be of length 2, 3, or 4

  • Second dimension is 2 or greater

  • Third dimension is 4 or less

Shape["2-4, 2-*, *-4"]

Labels

Dimensions can be given labels, and in future versions these labels will be propagated to the generated JSON Schema

Shape["3 x, 4 y, 5 z"]

Arbitrary dimensions

After some specified dimensions, one can express that there can be any number of additional dimensions with an ... like

Shape["3, 4, ..."]

Any-Shaped

If dtype is also Any, one can just use

field: NDArray

If a dtype is being passed, use the '*' wildcard along with the '...'

field: NDArray[Shape['*, ...'], int]

Annotated type with NDArraySchema

Tip

See also: Typechecker Integration

If you don’t need compatibility with multiple array backends, or want to have an array statically type check as a single array backend type, Use the annotated schema form with :func:.NDArraySchema.

from numpydantic import NDArraySchema
class MyModel(BaseModel):
    field = Annotated[np.ndarray, NDArraySchema(Shape("{shape_expression}"), dtype)]

NDArraySchema() also validates that the given array is of the specified type, rather than any array backend that matches the dtype and shape.

Caveats

Todo

numpydantic currently does not support structured dtypes or numpy.recarray specifications like nptyping does. It will in future versions.

Todo

numpydantic also does not support the variable shape definition form like

Shape['Dim, Dim']

where there are two dimensions of any shape as long as they are equal because at the moment it appears impossible to express dynamic constraints (ie. minItems/maxItems that depend on the shape of another array) in JSON Schema. A future minor version will allow them by generating a JSON schema with a warning that the equal shape constraint will not be represented.

See: https://github.com/orgs/json-schema-org/discussions/730