Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PyO3's abi3-py38 feature #8

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Support PyO3's abi3-py38 feature #8

wants to merge 4 commits into from

Conversation

termoshtt
Copy link
Member

@termoshtt termoshtt commented Jun 6, 2024

extract::<&str> is replaced by extract::<String>. This means we take additional memcpy, but this crate is not intended to avoid this cost.

@HenningHolmDE
Copy link
Contributor

@termoshtt I was just about to prepare a similar PR to allow using abi3-py38 when I saw this draft.

Can I help in any way so we could get this landed?

@termoshtt
Copy link
Member Author

This means we take additional memcpy

My main concern is here. Although it is acceptable for this crate, I thought that I need to seek more better way and kept this PR as draft. As a result, we use python-version specific packaging in our project and this PR is left as it is.

If you now do not know a way to avoid it, I can merge this PR.

@HenningHolmDE
Copy link
Contributor

I understand your concern and would also rather prevent the additional memcpy.

When digging into the issue yesterday I ended up with the feeling that PyUnicode_AsUTF8AndSize (thus, the abi3-py310 limitation) is the only way to access data from a PyUnicode without copying the buffer in the process. I might be wrong here, however.

PyO3 relies on PyUnicode_AsUTF8AndSize in the implementation of Borrowed<PyString>::to_str (which is basically called in the case of our PyString::extract::<&str>().
Reading through the code, I the implementation of Borrowed<PyString>::to_cow cought my eye: When available, it calls to_str to return a Cow::Borrowed, otherwise uses the copied data (through PyUnicode_AsUTF8String) to return a Cow::Owned.

My suggestion would therefore be something like this:

         if self.0.is_instance_of::<PyString>() {
-            return visitor.visit_str(&self.0.extract::<String>()?);
+            return visitor.visit_str(&self.0.downcast()?.to_cow()?);
         }

This way, we should be able to keep the current zero-copy implementation where possible and only add the addional memcpy when abi3-py38 is selected.

However, for avoiding the memcpy at all costs in a use case, using python-version specific packaging will still be the better way as PyUnicode_AsUTF8AndSize has been available in the non-limited API before 3.8.
In my use case, I would rather avoid building the binaries for all versions (or bumping the minimum supported Python version for our package to 3.10).

What do you think?

@HenningHolmDE
Copy link
Contributor

@termoshtt What are your thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants