Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "T" and "V" dtypes in from_dtype #4226

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tybug
Copy link
Member

@tybug tybug commented Jan 3, 2025

(probably) resolves #4039. I did look around for what it would take to support user-defined dtypes, but found...little documentation on the topic in the numpy docs 😅. I found along the way that we don't support the void dtype, so this pull adds support for that too.

I have some local work on accepting dtype-coercible strings in from_dtype (dtype: Union[np.dtype, str]) - would that change be welcome as well? nps.from_dtype(np.dtype("int8")) feels a bit cumbersome, but I also understand wanting to keep the api surface tight.

@tybug tybug force-pushed the numpy-dtype branch 2 times, most recently from 10ef133 to c683d41 Compare January 3, 2025 16:06
@tybug
Copy link
Member Author

tybug commented Jan 3, 2025

It seems possible that np.void was left intentionally unsupported in the code. Let me know if I've missed some consideration for this, or some reason it should not be implemented at all.

@Zac-HD
Copy link
Member

Zac-HD commented Jan 4, 2025

It seems possible that np.void was left intentionally unsupported in the code. Let me know if I've missed some consideration for this, or some reason it should not be implemented at all.

https://github.com/HypothesisWorks/hypothesis/pull/472/files#diff-8bf1182f853824fabcaef20e0c3fb9db9e2c512422633969813c00afcf0c3b69R36 suggests that I just didn't see a sensible semantics for it, and https://github.com/HypothesisWorks/hypothesis/pull/826/files#diff-f750b73af7d56838bbd646dd35542feafdec3b6f08243d1edd677d22965d8230R290-R292 that support for record/compound/nested dtypes was a concern - plausibly obviated by the dtypes overhaul in Numpy 2.0. I'd like to see tests here which explicitly pass all of record-dtypes, array-dtypes, nested combinations, etc.

If we provide a void_dtypes() strategy it should also generate every possible combination, which leads me to be pretty dubious about whether we should provide it. I also recall chatting to a Numpy maintainer at the SciPy sprints a couple of years ago who commented that they're not much used, and so I worry that exposing them prominently might lead to worse rather than better software overall.

I have some local work on accepting dtype-coercible strings in from_dtype (dtype: Union[np.dtype, str]) - would that change be welcome as well? nps.from_dtype(np.dtype("int8")) feels a bit cumbersome, but I also understand wanting to keep the api surface tight.

I'm generally against this; there are just too many things that can be coerced to dtypes - and e.g.: int, "int", and "int64" may or may not be the same on various systems - and so it seems like a confusion-minimizing choice to make the user handle whatever coercion they think should be happening before they hand it to us, where any debugging is firmly in their hands.

Comment on lines +226 to +230
result = st.binary(
**compat_kw(
"min_size", max_size=None if dtype.itemsize == 0 else dtype.itemsize
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should generate all itemsize bytes if that's not None, rather than variable-length?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally missed that Vn pads to n with \x00, thanks

Comment on lines +45 to +47
# (except for void, which does not have a default)
cond = lambda _: True if typ is np.void else lambda x: x != type(x)()
x = find_any(from_type(typ), cond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just skip the void case here; as you comment there's no notion of a default value and so this test is not meaningful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it in because this test doubles as a "can draw at all from from_dtype" - that's probably clearer if I split this into two (albeit mostly redundant) tests, and assume-away void in this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just stick a top-level if type is np.void:/else: statement in this test; it'll save a bit of time and still be clear.

Comment on lines +949 to +950
# Also allow generating undetermined-width dtypes like "S" / "S0"? Possibly with
# a new parameter allow_undetermined?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weakly against uncapped dtypes; they make sense as an aid to interactive use (eg notebooks) but aren't actually a dtype that an array can have.

Copy link
Member Author

@tybug tybug Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the flipside, they *are* a valid dtype, and I think we should avoid compromising completeness unless there's a reason to. Generating them seems harmless if you're passing directly to np.array(dtype=) (equivalent to the largest elem size), and if you're testing a function which accepts generic np.dtype then imo that's the place where you most want hypothesis to generate weird dtypes like this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see your point, but I can't think of an API that I like more than writing | st.just(np.dtype("S"))! Perhaps we can just document that trick, and explain why/when it's useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve support for new and user-defined Numpy dtypes (e.g. np.dtypes.StringDType)
2 participants