Skip to content

Commit

Permalink
Poetry configuration & custom serialization function support
Browse files Browse the repository at this point in the history
Instead of using `pip` with `requirements*.txt` files and `setup.py`,
this project has changed to use `poetry` and its single `pyproject.toml`
config file. This simplifies environment & dependency management and
enables the project to be easily published on PyPI.

The `serialize` and `deserialize` functions now support user-defined
data-type serialization functions. Both functions accept an optional
mapping of `type -> handler`: if a data type is encountered that
**exactly maches** a key (type) then its value (serialization function)
is used instead of the usual built-in logic.

This new custom serialization feature has new tests that show its use
serializing multi-dimensional `numpy` arrays and `torch` tensors.
  • Loading branch information
malcolmgreaves committed Jun 26, 2020
1 parent ba68bd6 commit cc46d5f
Show file tree
Hide file tree
Showing 13 changed files with 1,125 additions and 69 deletions.
2 changes: 2 additions & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
3.8.3
3.7.7
87 changes: 60 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# `core_utils`
# `pywise`

Contains functions that provide general utility and build upon the Python 3 standard library. It has no external dependencies.
- `serialization`: serialization & deserialization for `NamedTuple`-deriving & `@dataclass` decorated classes
Expand All @@ -12,39 +12,32 @@ Take a look at the end of this document for example use.


## Development Setup
Create a distinct environment for this code using coda via:
```
conda env create -y -n pycore -python=3.8
```
And activate it via:
```
conda activate pycore
```
And make the project's code available in the environment:
This project uses [`poetry`](https://python-poetry.org/) for virtualenv and dependency management. We recommend using [`brew`](https://brew.sh/) to install `poetry` system-wide.

To install the project's dependencies, perform:
```
pip install -e .
poetry install
```


## Testing
Install test and dev tool dependencies via:
Every command must be run within the `poetry`-managed environment.
For instance, to open a Python shell, you would execute:
```
pip install -r requirements-test.txt
poetry run python
```
Or with `pip install -e .[test]`.
Alternatively, you may activate the environment by performing `poetry shell` and directly invoke Python programs.


Run all tests via:
#### Testing
To run tests, execute:
```
pytest
poetry run pytest -v
```

Additionally, we use `tox` to perform tests across Python 3.8+ as well as 3.7+. To run both tests in parallel, do:
To run tests against all supported environments, use [`tox`](https://tox.readthedocs.io/en/latest/):
```
tox -p
poetry run tox -p
```


## Dev Tools
#### Dev Tools
This project uses `black` for code formatting, `flake8` for linting, and
`mypy` for type checking. Use the following commands to ensure code quality:
```
Expand All @@ -55,12 +48,13 @@ black .
mypy --ignore-missing-imports --follow-imports=silent --show-column-numbers --warn-unreachable .
# lints code
flake8 --max-line-length=100 --ignore=E501,W293,E303,W291,W503,E203,E731,E231 .
flake8 --max-line-length=100 --ignore=E501,W293,E303,W291,W503,E203,E731,E231,E721,E722,E741 .
```


## Examples
## Documentation via Examples

#### Nested @dataclass and NamedTuple
Lets say you have an address book that you want to write to and from JSON.
We'll define our data types for our `AddressBook`:

Expand Down Expand Up @@ -115,7 +109,7 @@ ab = AddressBook([
emergency_contact=Emergency("Superman", PhoneNumber(262,1249865,extension=1))
),
])
'''
```

We can convert our `AddressBook` data type into a JSON-formatted string using `serialize`:
```python
Expand All @@ -138,5 +132,44 @@ print(ab == new_ab)
# Any @dataclass decorated type is serializable.
```

NOTE: The `deserialize` function needs the type to deserialize the data into.
Note that the `deserialize` function needs the type to deserialize the data into. The deserizliation
type-matching is _structural_: it will work so long as the data type's structure (of field names and
associated types) is compatible with the supplied data.


#### Custom Serialization
In the event that one desires to use `serialize` and `deserialize` with data types from third-party libraries (e.g. `numpy` arrays) or custom-defined `class`es that are not decorated with `@dataclass` or derive from `NamedTuple`, one may supply a `CustomFormat`.

`CustomFormat` is a mapping that associates precise types with custom serialization functions. When supplied to `serialize`, the values in the mapping accept an instance of the exact type and produces a serializable representation. In the `deserialize` function, they convert such a serialized representation into a bonafide instance of the type.

To illustrate their use, we'll deine `CustomFormat` `dict`s that allow us to serialize `numpy` multi-dimensional arrays:
```python
import numpy as np
from core_utils.serialization import *


custom_serialization: CustomFormat = {
np.ndarray: lambda arr: arr.tolist()
}

custom_deserialization: CustomFormat = {
np.ndarray: lambda lst: np.array(lst)
}
```

Now, we may supply `custom_{serialization,deserialization}` to our functions. We'll use them to perform a "round-trip" serialization of a four-dimensional array of floating point numbers to and from a JSON-formatted `str`:
```python
import json

v_original = np.random.random((1,2,3,4))
s = serialize(v_original, custom=custom_serialization)
j = json.dumps(s)

d = json.loads(j)
v_deser = deserialize(np.ndarray, d, custom=custom_deserialization)

print((v_original == v_deser).all())
```

It's important to note that, when supplying a `CustomFormat` the serialization functions take priority over the default behavior (except for `Any`, as it is _always_ considered a pass-through). Moreover, types must match **exactly** to the keys in the mapping. Thus, if using a generic type, you must supply separate key-value entires for each distinct type parameterization.

1 change: 0 additions & 1 deletion VERSION

This file was deleted.

2 changes: 1 addition & 1 deletion core_utils/common.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from importlib import import_module
from typing import _GenericAlias, Any, Tuple, Optional, Type
from typing import _GenericAlias, Any, Tuple, Optional, Type # type: ignore


def type_name(t: type) -> str:
Expand Down
59 changes: 52 additions & 7 deletions core_utils/serialization.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,23 @@
from enum import Enum
from typing import Any, Iterable, Type, Tuple, Set, Mapping, TypeVar, _GenericAlias
from typing import ( # type: ignore
Any,
Iterable,
Type,
Tuple,
Set,
Mapping,
TypeVar,
Callable,
Optional,
)
from dataclasses import dataclass, is_dataclass, Field

from core_utils.common import type_name, checkable_type

__all__ = [
"serialize",
"deserialize",
"CustomFormat",
"is_namedtuple",
"is_typed_namedtuple",
"MissingRequired",
Expand All @@ -21,16 +32,35 @@
See: https://github.com/python/mypy/issues/3915
"""

CustomFormat = Mapping[Type, Callable[[Any], Any]]
"""Defines a mapping of type to function that will either serialize or deserialize that type.
See uses in :func:`serialize` and :func:`deserialize`.
"""


def serialize(value: Any) -> Any:
def serialize(value: Any, custom: Optional[CustomFormat] = None) -> Any:
"""Attempts to convert the `value` into an equivalent `dict` structure.
NOTE: If the value is not a namedtuple, dict, enum, or iterable, then the value is returned as-is.
NOTE: If the value is not a namedtuple, dataclass, mapping, enum, or iterable, then the value is
returned as-is.
The :param:`custom` optional mapping provides callers with the ability to handle deserialization
of complex types that are from an external source. E.g. To serialize `numpy` arrays, one may use:
```
custom = {numpy.ndarray: lambda a: a.tolist()}
```
NOTE: If :param:`custom` is present, its serialization functions are given priority.
NOTE: If using :param:`custom` for generic types, you *must* have unique instances for each possible
type parametrization.
"""
if is_namedtuple(value):
if custom is not None and type(value) in custom:
return custom[type(value)](value)

elif is_namedtuple(value):
return {k: serialize(raw_val) for k, raw_val in value._asdict().items()}

if is_dataclass(value):
elif is_dataclass(value):
return {k: serialize(v) for k, v in value.__dict__.items()}

elif isinstance(value, Mapping):
Expand All @@ -50,14 +80,29 @@ def serialize(value: Any) -> Any:
return value


def deserialize(type_value: Type, value: Any) -> Any:
def deserialize(
type_value: Type, value: Any, custom: Optional[CustomFormat] = None,
) -> Any:
"""Does final conversion of the `dict`-like `value` into an instance of `type_value`.
NOTE: If the input type `type_value` is a sequence, then deserialization is attempted on each
element. If it is a `dict`, then deserialization is attempted on each key and value. If this
specified type is a namedtuple or enum, then it will be appropriately handled.
specified type is a namedtuple, dataclass, or enum, then it will be appropriately handled.
Values without these explicit types are returned as-is.
The :param:`custom` optional mapping provides callers with the ability to handle deserialization
of complex types that are from an external source. E.g. To deserialize `numpy` arrays, one may use:
```
custom = {numpy.ndarray: lambda lst: numpy.array(lst)}
```
NOTE: If :param:`custom` is present, its deserialization functions are given priority.
NOTE: If using :param:`custom` for generic types, you *must* have unique instances for each possible
type parametrization.
"""

if custom is not None and type_value in custom:
return custom[type_value](value)

if type_value == Any:
return value

Expand Down
Loading

0 comments on commit cc46d5f

Please sign in to comment.