-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ModelBuilder
and RandomBuilder
#971
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #971 +/- ##
==========================================
+ Coverage 92.78% 92.82% +0.03%
==========================================
Files 108 108
Lines 8182 8222 +40
==========================================
+ Hits 7592 7632 +40
Misses 590 590 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - thanks 👍
ModelBuilder
is something which started fairly simple, but as more and more edge cases were discovered it has become quite complex.
I agree that your solution is cleaner and easier to follow.
I left a few comments.
piccolo/testing/random_builder.py
Outdated
random_value_callable = partial( | ||
cls.next_list, | ||
mapper[t.cast(Array, column).base_column.value_type], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if anything besides an Array
could end up in this block.
It wonder if we can add list
to mapper
in a clean way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of list
is quite complex, involving two aspects: the simple callable, such as RandomBuilder.next_bool
, and additional logic that requires knowledge of the column
to manufacture another callable. Since list
relies on obtaining the type from the mapper
, we cannot determine the exact type until the last moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - arrays are always tricky edge cases.
Maybe we could add a check if isinstance(column, Array)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my latest comment regarding this issue.
piccolo/testing/random_builder.py
Outdated
|
||
|
||
class RandomBuilder: | ||
@classmethod | ||
def _build(cls, column: Column) -> t.Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about calling this something like get_value_for_column
? Do you think it should be a private method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially considered treating it as a private method, but also wanted to find a way to provide a mechanism for registering a new type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice.
@dantownsend I completely agree with you; the task turned out to be more complex than I initially anticipated. Building the |
I think there's a lot of potential in this approach. Feel free to play around with other ideas, but this is a nice start. |
I've just pushed another version. This version introduces a hook for users to register their own random type. I'll review your comments tomorrow as it's already midnight here in Asia. |
The third version may be a bit easier to understand:
@classmethod
def unregister_types(cls) -> None:
cls.__OTHER_MAPPER.clear() |
precision, scale = column._meta.params.get("digits") or (4, 2) | ||
local_mapper[Decimal] = partial( | ||
RandomBuilder.next_decimal, precision, scale | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If RandomBuilder.next_decimal
incorporates default values for precision and scale (like this PR), the logic can be simplified to:
if precision_scale := column._meta.params.get("digits"):
local_mapper[Decimal] = partial(
RandomBuilder.next_decimal, *precision_scale
)
cls.__DEFAULT_MAPPER[base_type]() for _ in range(length) | ||
] | ||
elif column._meta.choices: | ||
reg = cls.get_registry(column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably go into the else
block, as we don't use it if the column has choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid observation. It seems that the presence of reg
is a result of the requirements in the previous version, where it was needed for multiple elif-else
blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is a good idea.
@classmethod
def _randomize_attribute(cls, column: Column) -> t.Any:
reg = cls.get_registry(column)
random_value = reg.get(enum.Enum, reg[column.value_type])()
if isinstance(column, (JSON, JSONB)):
return json.dumps({"value": random_value})
return random_value
@classmethod
def _get_local_mapper(cls, column: Column) -> t.Dict[t.Type, t.Callable]:
...
if _choices := column._meta.choices:
local_mapper[enum.Enum] = partial(
RandomBuilder.next_enum, _choices)
return local_mapper
if column.value_type == list: | ||
reg[list] = partial( | ||
RandomBuilder.next_list, | ||
reg[t.cast(Array, column).base_column.value_type], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to make things too insane, but multidimensional arrays are possible:
Array(Array(Integer())
I wonder what would happen here in that situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing this situation to my attention. I was curious about the behavior in our current codebase, so I conducted a quick test. It seems that the current implementation will throw a KeyError
for this situation (please correct me if I'm mistaken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a bit off-topic, but I wanted to try out this behavior in the Piccolo playground
, and it doesn't seem to be working. The code works fine if I just launch a terminal and enter the shell.
piccolo playground run --engine=sqlite3
In [1]: from piccolo.table import Table
In [2]: from piccolo.columns import Array, BigInt
In [3]: class MyTable(Table):
...: my_column = Array(Array(BigInt()))
...:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 class MyTable(Table):
2 my_column = Array(Array(BigInt()))
Cell In[3], line 2, in MyTable()
1 class MyTable(Table):
----> 2 my_column = Array(Array(BigInt()))
NameError: name 'Array' is not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrycw This happens because the Array
and BigInt
columns are not imported into the Playground application. If you patch the local Piccolo installation and add these two columns, everything works fine. Maybe should add all the possible columns to import and that would solve the problem..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sinisaos Thank you very much for identifying the source of the issue and providing the solution. It's working now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that the current implementation of ModelBuilder
doesn't handle multidimensional arrays, so I wouldn't worry about it if it's a tricky fix. Array
columns now have two methods to help with this kind of thing: _get_dimensions
and _get_inner_value_type
.
@classmethod | ||
def register_type(cls, typ: t.Type, callable_: t.Callable) -> None: | ||
cls.__OTHER_MAPPER[typ] = callable_ | ||
|
||
@classmethod | ||
def unregister_type(cls, typ: t.Type) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to register custom types is cool, but I wonder about the main use cases.
You can specify defaults at the moment:
await ModeBuilder.build(MyTable, defaults={MyTable.some_column: "foo"})
I'm not against being able to override how types are handled, but we just need to articulate to users when it's appropriate vs using defaults
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing up this question, it prompted me to reflect on the code and its implications.
The main difference between using default
and register_type
lies in their respective purposes:
- Utilizing
default
is suitable when users generally find our providedrandom logic
satisfactory, but they require a hardcoded value for a specific column on a one-time basis. - On the other hand, employing
register_type
is appropriate when users desire a custom implementation for a type, effectively overwriting our defaultrandom logic
for that specific type.
Here are three distinct use cases:- (1) Types not provided by us: For instance, the type like
next_decimal
is not available in the current release version, but users can implement their own logic for the type and inject it into theModelBuilder
. - (2) User preference for specific logic: In scenarios where we introduce new features, such as the shiny
next_decimal
logic (returningdecimal.Decimal
ifcolumn.value_type
isdecimal.Decimal
), users may prefer the previous implementation or have specific requirements. Withregister_type
, they have the flexibility to override the default behavior. - (3) Unanticipated user cases: This aspect is particularly valuable for registration. For example, consider a user who initially builds a successful e-commerce platform in the UK using
Piccolo
. Later, they expand into Asia and encounter legal requirements necessitating the storage of customer names in local languages. IfModelBuilder
does not support non-English characters, users can register their own implementations to address this issue.
- (1) Types not provided by us: For instance, the type like
A draft test for situations (1) and (2) might look like this:
class TableWithDecimal(Table):
numeric = Numeric()
numeric_with_digits = Numeric(digits=(4, 2))
decimal = Decimal()
decimal_with_digits = Decimal(digits=(4, 2))
class TestModelBuilder(unittest.TestCase):
...
def test_registry_overwritten1(self):
table = ModelBuilder.build_sync(TableWithDecimal)
for key, value in table.to_dict().items():
if key != "id":
self.assertIsInstance(value, decimal.Decimal)
def fake_next_decimal(column: Column) -> float:
"""will return `float` instead of `decimal.Decimal`"""
precision, scale = column._meta.params["digits"] or (4, 2)
return RandomBuilder.next_float(
maximum=10 ** (precision - scale), scale=scale
)
ModelBuilder.register_type(decimal.Decimal, fake_next_decimal)
overwritten_table = ModelBuilder.build_sync(TableWithDecimal)
for key, value in overwritten_table.to_dict().items():
if key != "id":
self.assertIsInstance(value, float)
A draft test for situations (3) might look like this:
class TestModelBuilder(unittest.TestCase):
...
def test_registry_overwritten2(self):
choices = "一二三" # Chinese characters
def next_str(length: int = 3) -> str:
# Chinese names often consist of three Chinese characters
return "".join(random.choice(choices) for _ in range(length))
ModelBuilder.register_type(str, next_str)
manager1 = ModelBuilder.build_sync(Manager)
self.assertTrue(all(char_ in choices for char_ in manager1.name))
poster1 = ModelBuilder.build_sync(Poster)
self.assertTrue(all(char_ in choices for char_ in poster1.content))
ModelBuilder.unregister_type(str)
manager2 = ModelBuilder.build_sync(Manager)
self.assertTrue(all(char_ not in choices for char_ in manager2.name))
poster2 = ModelBuilder.build_sync(Poster)
self.assertTrue(all(char_ not in choices for char_ in poster2.content))
The scenario is as follows: Manager1
is a locally hired individual, while Manager2
is dispatched from the UK. Both are working on the poster
using their native languages.
Finally, I realized I had overlooked the magic behavior of Python's name mangling rules. For instance:
>>> class ModelBuilder:
... __OTHER_MAPPER = {}
...
>>> ModelBuilder.__OTHER_MAPPER
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'ModelBuilder' has no attribute '__OTHER_MAPPER'
>>> ModelBuilder._ModelBuilder__OTHER_MAPPER
{}
As a result, the previous test code might be a bit off. I need to use the following code for the setup and teardown phases for each test:
def setUp(self) -> None:
ModelBuilder._ModelBuilder__OTHER_MAPPER.clear() # type: ignore
def tearDown(self) -> None:
ModelBuilder._ModelBuilder__OTHER_MAPPER.clear() # type: ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining the rationale behind it - it makes sense.
If I was to completely redesign ModelBuilder
, I probably wouldn't have class methods. Instead of:
await ModelBuilder.build(MyTable)
I would have:
await ModelBuilder(some_option=True).build(MyTable)
So we can configure ModelBuilder's behaviour easier. For registering types we could have:
custom_builder = ModelBuilder(types={...})
await custom_builder.build(MyTable)
We could allow the types to be passed in via the build
method instead:
await ModelBuilder.build(MyTable, types={...})
If register
and unregister
work globally, there are pros and cons. The main pro is you only need to set it up once (e.g. in a session fixture of Pytest). But if you were to somehow run your tests in parallel, it might be problematic.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing your thoughts with me. Personally, I am inclined towards the instance method approach. However, implementing this change might break the current interface. I propose a three-stage transition plan:
-
First stage: Utilize the concept of
descriptors
to distinguish between calls from theclass
orinstance
. Initially, we move the current implementation to theclass
branch to maintain user experience. Simultaneously, we start implementing the new concept in theinstance
branch, issuing anexperimental warning
. -
Second stage: If the new concept gains appreciation from users or developers, we add a
deprecated warning
to theclass
branch. -
Third stage: Remove the
class
branch and clean up the code to ensure all methods areinstance methods
by the end.
During the first two stages, we'll keep the class
branch unchanged and encourage users to try out the new syntax and the new features. If we reach the third stage, users who prefer the class
branch might need to adjust their habits from using await ModelBuilder.build(...)
to await ModelBuilder().build()
or ModelBuilder.build_sync(...)
to ModelBuilder().build_sync(...)
.
The concept of descriptors
is relatively straightforward, but it can sometimes feel too magical to grasp. I often need a refresher before coding if I haven't touched it for a long time. Fortunately, we don't need the complex __get__
and __set__
logic for the data descriptor
. A simple non-data descriptor
should suffice for our use case. With the help of this post, I've drafted a concept code as follows:
import asyncio
import inspect
import typing as t
from concurrent.futures import ThreadPoolExecutor
def run_sync(coroutine: t.Coroutine):
try:
# We try this first, as in most situations this will work.
return asyncio.run(coroutine)
except RuntimeError:
# An event loop already exists.
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(asyncio.run, coroutine)
return future.result()
class dichotomy:
def __init__(self, f):
self.f = f
def __get__(self, instance, owner):
cls_or_inst = instance if instance is not None else owner
if inspect.iscoroutine(self.f):
async def newfunc(*args, **kwargs):
return await self.f(cls_or_inst, *args, **kwargs)
else:
def newfunc(*args, **kwargs):
return self.f(cls_or_inst, *args, **kwargs)
return newfunc
class ModelBuilder:
def __init__(self, *args, **kwargs):
self._types = "..." # Some information for instance method
@dichotomy
async def build(self_or_cls, *args, **kwargs):
if inspect.isclass(self_or_cls):
print("called as a class method from build")
cls = self_or_cls
await cls._build()
else:
print("called as an instance method from build")
self = self_or_cls
await self._build()
@dichotomy
def build_sync(self_or_cls, *args, **kwargs):
return run_sync(self_or_cls.build())
@dichotomy
async def _build(self_or_cls, *args, **kwargs):
if inspect.isclass(self_or_cls):
print("called as a class method from _build", end="\n"*2)
cls = self_or_cls # noqa: F841
# Current implementation remains here.
else:
print("called as an instance method from _build")
self = self_or_cls
# Some information can be retrieved.
print(f'{self._types=}', end="\n"*2)
# Our new logics
async def main():
print('Async ModelBuilder.build: ')
await ModelBuilder.build()
print('Async ModelBuilder().build: ')
await ModelBuilder().build()
print('Sync ModelBuilder.build: ')
ModelBuilder.build_sync()
print('Sync ModelBuilder().build: ')
ModelBuilder().build_sync()
if __name__ == '__main__':
asyncio.run(main())
Async ModelBuilder.build:
called as a class method from build
called as a class method from _build
Async ModelBuilder().build:
called as an instance method from build
called as an instance method from _build
self._types='...'
Sync ModelBuilder.build:
called as a class method from build
called as a class method from _build
Sync ModelBuilder().build:
called as an instance method from build
called as an instance method from _build
self._types='...'
Finally, I agree that making register
and unregister
work globally could make it challenging to verify test results in parallel scenarios. I might lean towards using instance methods
for the registering issue again.
These are just rough ideas that came to mind. I'm open to further discussions and refinements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using descriptors is an interesting idea. I've used them sparingly before - as you say, they're very powerful, but can be confusing.
There's a lot of really good ideas in this PR, and I don't want to bog things down. I wonder if we could add this in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly! Here are some options to consider for closing this PR:
- Closing the PR without merging any changes.
- Keeping the current code as is, while implementing the
next_decimal
functionality and updating related code. - Utilizing the latest commit of this PR while removing the option for users to register custom types.
- Merging the PR with its latest commit.
- Considering any other suggestions or alternatives.
I'm open to any of these choices. @dantownsend , what are your thoughts on this matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dantownsend no worries at all. It's great to see the project progressing on various fronts. I'll make an effort to review it and share any opinions or feedback I may have.
I encountered
ModelBuilder._randomize_attribute
and initially found the numerousif-elif-else
checks puzzling. However, further investigation revealed that certain factories necessitate additional information from thecolumn
. I attempted to consolidate these logics intoRandomBuilder._build
, aiming for improved clarity. Nevertheless, the final code may not be as pristine as desired. I'm looking forward to hearing the community's thoughts on whether this refactoring improves the codebase.