Syntax¶
General form:
field: NDArray[Shape["{shape_expression}"], dtype]
Dtype¶
Dtype checking is for the most part as simple as an isinstance check -
the dtype attribute of the array is checked against the dtype provided in the
NDArray annotation. Both numpy and builtin python types can be used.
A tuple of types can also be passed:
field: NDArray[Shape["2, 3"], (np.int8, np.uint8)]
Like nptyping, the dtype module provides convenient access
and aliases to the common dtypes, but also provides “generic” dtypes like
Float that is a tuple of all subclasses of
numpy.floating. Numpy interprets float as being equivalent to
numpy.float64, and numpy.floating is an abstract parent class,
so “generic” tuple dtypes fill that narrow gap.
Todo
Future versions will support interfaces providing type maps for declaring equality between dtypes that may be specific to that library but should be considered equivalent to numpy or other library’s dtypes.
Todo
Future versions will also support declaring minimum or maximum precisions, so one might say “at least a 16-bit float” and also accept a 32-bit float.
Shape¶
Shape Forms¶
The individual constraints for a shape (below) can be expressed in several forms. This is for typechecking compatibility and historical reasons.
Our goal is to converge on a type syntax that is close to numpy’s:
import numpy as np
from typing import Any, TypeVar
_T_Shape = TypeVar("_T_Shape", bound=tuple[Any, ...], default=tuple)
_T_Dtype = TypeVar("_T_Dtype", bound=np.generic, default=Any)
NDArray = np.ndarray[_T_Shape, np.dtype[_T_Dtype]]
which just treats shape as a tuple (usually tuple[int, ...]),
and the dtype argument as a subscript of np.dtype.
All the type forms below are valid at runtime, but only the tuple form will pass static typechecking without the mypy plugin.
Tuple Form (Preferred in >=v2)¶
In >v2.0, Shape will become an alias for tuple.
This form is somewhat in flux as we get it nailed down, as certain typing constructs like ellipses and ranges are challenging to specify or have nonideal default behavior.
The technically correct, but extremely annoying tuple form uses Literal
values for every argument:
from typing import Literal as L
from numpydantic import NDArray, Shape
# these are equivalent
NDArray[Shape[L[1], L["2-3"], L["*"], L["..."]]]
NDArray[tuple[L[1], L["2-3"], L["*"], ...]]
NDArray[tuple[L[1], L["2-3"], int, ...]]
Mypy, via the plugin, will support typechecking a more reasonable form:
NDArray[Shape[1, "2-3", "*", "..."]]
NDArray[tuple[1, "2-3", "*", ...]]
NDArray[tuple[1, "2-3", int, ...]]
and we will explore additional refinements as needed.
String Form (nptyping)¶
The pure string form is inherited from nptyping. Its use is discouraged in new code: it will be deprecated in v2.0 and removed in v3.0.
The string form is syntactically invalid to the python type system, and is less inspectable than the tuple form.
# these are equivalent
NDArray[Shape["1, 2-3, *, ..."]]
NDArray[Shape[Literal["1, 2-3, *, ..."]]]
Functional Form¶
The functional form should only be used within NDArraySchema()
or when it is otherwise the only form that satisfies static type checkers.
Annotated[np.ndarray, NDArraySchema(Shape(1, "2-3", "*", "..."))]
Shape Args¶
Full documentation of nptyping’s shape syntax is available in the nptyping docs, but for the sake of self-contained docs, the high points are:
Numerical Shape¶
A comma-separated list of integers.
For a 2-dimensional, 3 x 4-shaped array:
Shape["3, 4"]
Wildcards¶
Wildcards indicate a dimension can be any size
For a 2-dimensional, 3 x any-shaped array:
Shape["3, *"]
Ranges¶
Dimension sizes can also be specified as ranges[1]. Ranges must have no whitespace, and may use integers or wildcards. Range specifications are inclusive on both ends.
For an array whose…
First dimension can be of length 2, 3, or 4
Second dimension is 2 or greater
Third dimension is 4 or less
Shape["2-4, 2-*, *-4"]
Labels¶
Dimensions can be given labels, and in future versions these labels will be propagated to the generated JSON Schema
Shape["3 x, 4 y, 5 z"]
Arbitrary dimensions¶
After some specified dimensions, one can express that there can be any number
of additional dimensions with an ... like
Shape["3, 4, ..."]
Any-Shaped¶
If dtype is also Any, one can just use
field: NDArray
If a dtype is being passed, use the '*' wildcard along with the '...'
field: NDArray[Shape['*, ...'], int]
Annotated type with NDArraySchema¶
Tip
See also: Typechecker Integration
If you don’t need compatibility with multiple array backends,
or want to have an array statically type check as a single array backend type,
Use the annotated schema form with :func:.NDArraySchema.
from numpydantic import NDArraySchema
class MyModel(BaseModel):
field = Annotated[np.ndarray, NDArraySchema(Shape("{shape_expression}"), dtype)]
NDArraySchema() also validates that the given array is of the specified type,
rather than any array backend that matches the dtype and shape.
Caveats¶
Todo
numpydantic currently does not support structured dtypes or numpy.recarray
specifications like nptyping does. It will in future versions.
Todo
numpydantic also does not support the variable shape definition form like
Shape['Dim, Dim']
where there are two dimensions of any shape as long as they are equal
because at the moment it appears impossible to express dynamic constraints
(ie. minItems/maxItems that depend on the shape of another array)
in JSON Schema. A future minor version will allow them by generating a JSON
schema with a warning that the equal shape constraint will not be represented.
See: https://github.com/orgs/json-schema-org/discussions/730