-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PLAN] Tracking issue for 2.10.1 #4125
Comments
I've been turning over in my mind what we should do about #4105 — It leaves the 2.10 release in a doubt-full state, but unfortunately nobody provided even the slightest clue what the problem could be. I don't want to act on suspicions, as they could lead us astray badly and waste a lot of time. (Furthermore, if nobody cares enough to provide actionable clues, it puts the importance of the issue for them into doubt, too.) In the meantime we also got:
Considering all of this together, I believe it will be best to abandon the 2.10 line b/o the doubts about ABI compatibility, and |
I'm still strongly in favor of a 2.10.1. It will be be no worse than 2.10.0, and it needs to be done before 3.11 is released, which is combine up soon. Both of those "bigger" changes above really should sit in master for a bit before release. Also, I don't think an ABI bump with a "go back" option is all that useful. Distributions like conda-forge can't handle "selectable" ABI bumps. A version of pybind11 has a set ABI, and they can run a migration if it changes. Let's release a 2.10.1 later this week, then put in those larger changes, maybe go ahead and merge some of the other ABI bump ideas we've had in the past, then do a 2.11 properly. |
I think it is: it gives 2.10 more legitimacy than it deserves. Unsuspecting users will upgrade to 2.10.1, only to discover that they will want to 2.11, i.e. we're leading people into wasting effort. I'm happy to cut the 2.11 release, as soon as #4201 is merged. How much longer do you think you need to fix up the scikit_build_example? All the rest looks like it could be done with a few hours of effort at most, if we don't drag our feet, but I don't know about scikit.
If we explain well (a couple sentences will do here I think) we give people a choice. I don't like to force people's hands, unless there really is no other way, which is evidently (#4228) not the case here.
We cannot do 2.10 properly because of the doubts-without-clues, but I don't see why we cannot do 2.11 properly very soon, minus the scikit issue that's been lingering for 2 months already. But we could just document that that's broken since 2.10 and we still haven't been able to fix. |
I'd prefer not to leave a release series "broken" - some users will undoubtedly be stuck on 2.10 for a while. A 2.10.1 with better Python 3.11 support would be seen as a "safer" upgrade than 2.11.0. I've moved to 2.10 everywhere and have been very happy with it. The ABI problem doesn't affect all users. Even if we release 2.11 a couple of weeks later, I think we should leave the 2.10 series in the best possible state.
Only if people read it, and only if they can. Conda-forge, for example, doesn't have a choice and this will trigger a migration. Given 2.10 is the 'last' of a very long ABI series, I wouldn't be surprised if it's a mini Python 2.
"bigger" changes really should sit in master for a bit before release.
Probably this week. By next Monday, at the latest. |
By "broken" I'm referring to the Python 3.11 fixes, not the ABI incompatibility, which I'm not too worried about, at the most I think it's likely quite small. |
How about:
If they can upgrade to 2.10.1, with the option to pin internals version 4, why would they not want to take a 2.11 instead? Especially to get rid of #4226, which is a long-standing but nevertheless serious issue.
Why? Our testing is extremely thorough. We're not forcing anyone to upgrade. Let people choose their own pace. Holding developments back because of version numbering considerations looks to me like putting the cart in front of the horse: version numbers are meant to inform, e.g. to help staying organized in a progressing environment, not to stifle progress. Using my own situation as an example: the Eigen::Tensor work looks super useful. Nobody has been asking me for it, but I want it deployed Google-internally asap, I bet it will get picked up quickly when people see it. — Only for completeness: Releases don't matter to me directly (I merge to smart_holder, then deploy from there) but having a close match with releases helps keeping everybody on the same page internally and externally. Therefore I'm happy to put in the effort cutting 2.11. |
That's fine and about what I was thinking - I'm not against a 3.11 in the near future, I just want a 3.10.1 too. I don't think we are really thinking that differently.
It's never as thorough as real world usage, and we have quite a few projects running off of master, as evidenced by the quick issues and PRs whenever we break master. Usually breaking changes to master get noticed within a week. Our testing is on lots of systems, but only our unit testing, which isn't full coverage (in fact, we have prioritized a quick test suite over a thorough one). Google's test suite is great, but focused - some things it just doesn't catch. ABI breakages, for example, can't be caught by either. New features are often not caught by either (though a little rough edges on a new feature is better than breaking old code). Some things just aren't used at all by Google that are important to other projects. I'd recommend at least letting any large PRs sit in master for at least week before releasing. If it's entirely additive (#4201), I don't think it's as important, but #4226 is not additive.
I think they are also a bit of a mark of stability. If you live at head, then you want
#4228 is already merged? Is it something I need to keep out of the 2.10.1 release? I'm totally okay with branching, it's not that much work and much better than holding up progress. If you want to merge something (that's been reviewed), tag it with 2.11 and I'll work in the v2.10 branch. |
Unless I missed something, #4228 is the NVIDIA workaround? That's fine for a patch?... I finally understand what the problem is with scikit_build_example. Not totally sure what the best fix for it is, but I know what's going wrong. |
Yes, the enable_if version in that PR is great as a patch for the nvcc issue :) |
Cool, thanks! (typo? 2.11, 2.10.1? mainly mentioning in case someone looks here later) #4201 good another great push today and I hope we can merge it tomorrow or so. (I hope to merge it into smart_holder and deploy internally right after.) What's the best way to handle the 2.10 branch? Is see you marked master already as 2.11.0.dev1 (8d82f29). Do we need to do something special before merging #4201? |
@Henry Centos7 has been failing for a whole week now. I see EOL is still 1 yr 9 mo off (https://cloud.google.com/compute/docs/eol/centos-eol-guidance), but I'm thinking we should drop this ancient platform anyway (release date was 2014-07-07, before pybind11 even existed). The centos builds are regular troublemakers, nibbling away more than their fair share from our time. Could we re-host PGI 22.3 on a less troublesome OS, ideally a modern one? |
CetnOS 7 is not failing, and it's a major and incredibly important platform. It's the base for |
I see now, we have "centos:7" too. I only looked for "CentOS7" before. I never meant to suggest dropping something that's still working and doesn't limit anything at the moment. What could be a good host OS to pick for PGI? |
CentOS 7? Given the issue is Nvidia stopped signing the binary, I don't think it matters what OS it's on. I think we just have to update from 22.3, current is 22.9. |
I think that's a good idea. I'll bump it in a PR to 22.7 (which we currently have at NERSC's Perlmutter supercomputer). But I am also generally ok to bump it higher if needed to address compiler bugs. Update, oh you did this already in #4260 👍 |
One more minor update we could squeeze in: #4269 |
Link to my concerns about rushing out a 2.10 release: #4271 (comment) |
@henryiii this it to acknowledge that I have received your DM in response to my comment:
Thank you for letting me know your assessment of my wish in such clear terms. I'm open to changing my mind as I receive new information, but based on what I know at this minute, I still wish to distance myself form the 2.10.1 release. As I wrote before, I believe it distracts us — and many others — from making a clean move going to 2.11, resolving the ABI doubt, and flipping the default for the GIL management implementation. I'm also open to alternative suggestions for distancing myself from the 2.10.1 release. Simply removing my name from the list of contributors was meant to be a practically effortless way of doing that. I hope that my simple wish can be respected in one way or another. I will continue to look at PRs, related to the release or not, and comment if I feel I have something helpful to contribute, even though I don't agree with the goal of continuing the 2.10 release series. |
Even if we did skip the 2.10.1 release, I would insist the default for GIL management would not switch until after 2.11. We need at least one release to warn users that supported API is going to disappear, and give them time to react. Having a 2.10.1 gives us a way to a) let users know about the ABI uncertainty, b) let users respond to the upcoming API removal, and c) fix various issues with distribution, packaging, and compilers that are causing issues. |
I'm OK to keep the non-simple code for a year+ as long as it is opt-in and not opt-out (even though it makes the pybind11 code quite a bit more complicated b/o the internals pointer). Having to add a define is in all likelihood a much smaller effort than having to debug a complex production system running into one of the reported issues. |
I think we can drop it on the first release after the known projects adapt to using their own code. I just require it to be opt-out for one release (either 2.10.1 or 2.11.0, your choice). I have a 10K+ word essay on versioning caping, and a key takeaway is that in the Python ecosystem, following the example of CPython, we provide deprecation periods and do not try to break users immediately. |
That makes sense for deprecating features already replaced by new ones, but it makes no sense to apply the same logic to deprecating traps, especially in our situation, when we know of just 2 cases that need to make a trivial adjustment to buy safety for everybody else. Could you please give me a firm timeline for when the default will be flipped from unsafe to safe according to your plans? |
Do we have an example of anything it fixes? I setup and tested the issue it was supposed to fix and it didn't fix it. I'd like a concrete example of it fixing something before calling it "safe/unsafe" instead of "simple/custom". I think deprecation in 2.10.1, switch default in 2.11 is reasonable. I like that it makes the non-PyPy compatible code opt-in, rather than making something you can easily create that then doesn't work on PyPy. I would really like to see a) both known packages either add the flag or the custom workaround, and b) a concrete example of how to do this custom workaround manually that we can place in the upgrade guide.
It is not on me to find all possible uses. I can't see private code, I didn't check gitlab or bitbucket or even all of GitHub. There are over 100K uses of PYBIND11_MODULE on GitHub public code alone. Currently we know it might help one user (PyTorch) that has already worked around the issue, and it will break two significant packages. It won't fix currently broken code, it just will enable more flexible code in the future (assuming it does work). |
To add to my #4105 (comment) comment here: My previous resistance to continuing the 2.10 release series was based on the assumption that incrementing the internals version will resolve the ABI incompatibility doubts. I no longer believe that. I think that we're up against a different kind of ABI issue, a bad interaction between #2999 and #1895. I also think that we may be able to find a way to handle or at least mitigate the situation, at a minimum with a clear explanation in the documentation. With that it would make sense to continue the 2.10 series, after we have full, agreed-on clarity. |
One other potential blocker. Should be a quick fix once we figure out the right behavior: #4288 |
"something we haven't fixed but is present in the last release" is not a "blocker". If that was true, there would never be a release. Blockers are only issues that are not in a previous release. However, it's fine to include a quick fix if it makes it in and looks safe. But I object to it being called a blocker. :) |
@henryiii I suppose technically it could be a regression since the exception used to be in the ctor and therefore you could have a try catch around the error through which isn't viable anymore. Also I think part of this might be due to our more aggressive conversion to a std::string. |
In 2.10.0? Or after 2.10.0? It's only a blocker if it happened after 2.10.0. A blocker means the release is worse than the release before. After the release, a missed blocker becomes a regression. There is no requirement that we fix all regressions in the first point release. There can always be a second point release. Again, happy to get it fixed if it's easy. |
Under those terms, it's a regression. Still a pretty nasty one though. |
CI for #4297 (fixes #4288) is running, once that's merged let's pull the trigger on 2.10.1? @henryiii is there a chance that you could help with the cherry picks and changelog update? (I don't think I've ever done that, and I already lost a ton of time on #4216, with the deadlocks still hanging over my head.) |
Yes, I can do that, I should have time tomorrow or Tuesday. I've already done most of the cherry picks, I'll just need the last 1-2 PRs. |
Thanks a lot @henryiii! |
Known regressions regressionRegression in a recent release
:
Failed fixes:
Planned features:
PYBIND11_SIMPLE_GIL_MANAGEMENT
option (cmake, C++ define) #4216Fixes:
config.use_environment=1
+PYTHONPATH
test #4119The next release of PyPy will help too. :)
The text was updated successfully, but these errors were encountered: