[PLAN] Tracking issue for 2.10.1 #4125

henryiii · 2022-08-08T15:09:39Z

Known regressions regression Regression in a recent release :

Failed fixes:

fix(cmake): support vcpkg, try 2 #4123

Planned features:

Fixes:

The next release of PyPy will help too. :)

The text was updated successfully, but these errors were encountered:

rwgk · 2022-10-10T17:29:22Z

@henryiii @Skylion007 @ax3l

I've been turning over in my mind what we should do about #4105 — It leaves the 2.10 release in a doubt-full state, but unfortunately nobody provided even the slightest clue what the problem could be. I don't want to act on suspicions, as they could lead us astray badly and waste a lot of time. (Furthermore, if nobody cares enough to provide actionable clues, it puts the importance of the issue for them into doubt, too.)

In the meantime we also got:

Add Eigen::Tensor & Eigen::TensorMap support #4201
Add PYBIND11_SIMPLE_GIL_MANAGEMENT option (cmake, C++ define) #4216 (with Strip down gil.h to the simple version only. #4218 as the best resolution IMO)

Considering all of this together, I believe it will be best to abandon the 2.10 line b/o the doubts about ABI compatibility, and
go straight to 2.11, bumping the internals version (to resolve the doubts) but leave the #ifs of #4218 (internals.h) in place for now, so people can choose to go back to internals version 4 if it works for them and they are ok with the doubts.

henryiii · 2022-10-10T18:15:12Z

I'm still strongly in favor of a 2.10.1. It will be be no worse than 2.10.0, and it needs to be done before 3.11 is released, which is combine up soon. Both of those "bigger" changes above really should sit in master for a bit before release.

Also, I don't think an ABI bump with a "go back" option is all that useful. Distributions like conda-forge can't handle "selectable" ABI bumps. A version of pybind11 has a set ABI, and they can run a migration if it changes. Let's release a 2.10.1 later this week, then put in those larger changes, maybe go ahead and merge some of the other ABI bump ideas we've had in the past, then do a 2.11 properly.

rwgk · 2022-10-10T18:27:29Z

'm still strongly in favor of a 2.10.1. It will be be no worse than 2.10.0

I think it is: it gives 2.10 more legitimacy than it deserves. Unsuspecting users will upgrade to 2.10.1, only to discover that they will want to 2.11, i.e. we're leading people into wasting effort.

I'm happy to cut the 2.11 release, as soon as #4201 is merged.

How much longer do you think you need to fix up the scikit_build_example? All the rest looks like it could be done with a few hours of effort at most, if we don't drag our feet, but I don't know about scikit.

Also, I don't think an ABI bump with a "go back" option is all that useful.

If we explain well (a couple sentences will do here I think) we give people a choice. I don't like to force people's hands, unless there really is no other way, which is evidently (#4228) not the case here.

then do a 2.11 properly.

We cannot do 2.10 properly because of the doubts-without-clues, but I don't see why we cannot do 2.11 properly very soon, minus the scikit issue that's been lingering for 2 months already. But we could just document that that's broken since 2.10 and we still haven't been able to fix.

henryiii · 2022-10-10T18:36:49Z

I'd prefer not to leave a release series "broken" - some users will undoubtedly be stuck on 2.10 for a while. A 2.10.1 with better Python 3.11 support would be seen as a "safer" upgrade than 2.11.0. I've moved to 2.10 everywhere and have been very happy with it. The ABI problem doesn't affect all users.

Even if we release 2.11 a couple of weeks later, I think we should leave the 2.10 series in the best possible state.

If we explain well (a couple sentences will do here I think) we give people a choice.

Only if people read it, and only if they can. Conda-forge, for example, doesn't have a choice and this will trigger a migration. Given 2.10 is the 'last' of a very long ABI series, I wouldn't be surprised if it's a mini Python 2.

I'm happy to cut the 2.11 release, as soon as #4201 is merged.

"bigger" changes really should sit in master for a bit before release.

How much longer do you think you need to fix up the scikit_build_example?

Probably this week. By next Monday, at the latest.

henryiii · 2022-10-10T18:37:39Z

By "broken" I'm referring to the Python 3.11 fixes, not the ABI incompatibility, which I'm not too worried about, at the most I think it's likely quite small.

rwgk · 2022-10-10T19:09:42Z

How about:

You cut the 2.10.1 release when you get to it.
We branch 2.10 before merging Add Eigen::Tensor & Eigen::TensorMap support #4201 or Reproducer and fix for issue encountered in smart_holder update. #4228 (which ever comes first).
I cut the 2.11.0 release as soon as possible, to unblock pytorch ([BUG]: Fatal Python error: PyThreadState_Get with c++ thread pool #4226) and get Eigen::Tensor (Add Eigen::Tensor & Eigen::TensorMap support #4201) into production.

I'd prefer not to leave a release series "broken" - some users will undoubtedly be stuck on 2.10 for a while.

If they can upgrade to 2.10.1, with the option to pin internals version 4, why would they not want to take a 2.11 instead? Especially to get rid of #4226, which is a long-standing but nevertheless serious issue.

"bigger" changes really should sit in master for a bit before release.

Why? Our testing is extremely thorough. We're not forcing anyone to upgrade. Let people choose their own pace. Holding developments back because of version numbering considerations looks to me like putting the cart in front of the horse: version numbers are meant to inform, e.g. to help staying organized in a progressing environment, not to stifle progress.

Using my own situation as an example: the Eigen::Tensor work looks super useful. Nobody has been asking me for it, but I want it deployed Google-internally asap, I bet it will get picked up quickly when people see it. — Only for completeness: Releases don't matter to me directly (I merge to smart_holder, then deploy from there) but having a close match with releases helps keeping everybody on the same page internally and externally. Therefore I'm happy to put in the effort cutting 2.11.

henryiii · 2022-10-11T05:05:13Z

That's fine and about what I was thinking - I'm not against a 3.11 in the near future, I just want a 3.10.1 too. I don't think we are really thinking that differently.

Our testing is extremely thorough.

It's never as thorough as real world usage, and we have quite a few projects running off of master, as evidenced by the quick issues and PRs whenever we break master. Usually breaking changes to master get noticed within a week. Our testing is on lots of systems, but only our unit testing, which isn't full coverage (in fact, we have prioritized a quick test suite over a thorough one). Google's test suite is great, but focused - some things it just doesn't catch. ABI breakages, for example, can't be caught by either. New features are often not caught by either (though a little rough edges on a new feature is better than breaking old code). Some things just aren't used at all by Google that are important to other projects. I'd recommend at least letting any large PRs sit in master for at least week before releasing. If it's entirely additive (#4201), I don't think it's as important, but #4226 is not additive.

version numbers are meant to inform

I think they are also a bit of a mark of stability. master could have bugs - we try our best, we keep tests passing, but it is fresh, un-vetted code. stable points at the last release, and doesn't update otherwise - releases are "stable", not fresh, code. If you release immediately after a big PR is merged, that's not "stable". A version number is intended to be more stable. It's a point at which we say the code is "ready for use". And if we catch a mistake, we backport fixes and release a new patch. So far, everything we have caught is enough for a combined patch release, but not so critical we need to rush a patch. (Ideally, we should have probably released two patch releases, actually).

If you live at head, then you want master, and that's fine, but it means you have thorough CI checking things. Many projects would rather stable code and are happy if it's a little stale. Many projects also live at head which is way waiting a little bit is useful. :)

We branch before merging ...

#4228 is already merged? Is it something I need to keep out of the 2.10.1 release? I'm totally okay with branching, it's not that much work and much better than holding up progress. If you want to merge something (that's been reviewed), tag it with 2.11 and I'll work in the v2.10 branch.

henryiii · 2022-10-11T05:07:47Z

Unless I missed something, #4228 is the NVIDIA workaround? That's fine for a patch?...

I finally understand what the problem is with scikit_build_example. Not totally sure what the best fix for it is, but I know what's going wrong.

ax3l · 2022-10-11T05:26:36Z

Yes, the enable_if version in that PR is great as a patch for the nvcc issue :)

rwgk · 2022-10-11T23:20:11Z

That's fine and about what I was thinking - I'm not against a 3.11 in the near future, I just want a 3.10.1 too. I don't think we are really thinking that differently.

Cool, thanks! (typo? 2.11, 2.10.1? mainly mentioning in case someone looks here later)

#4201 good another great push today and I hope we can merge it tomorrow or so. (I hope to merge it into smart_holder and deploy internally right after.)

What's the best way to handle the 2.10 branch?

Is see you marked master already as 2.11.0.dev1 (8d82f29). Do we need to do something special before merging #4201?

rwgk · 2022-10-18T23:58:04Z

@henryiii the branch point for 2.10.1 is b926396 (that commit to be included in 2.10.1)

rwgk · 2022-10-19T03:26:56Z

@Henry Centos7 has been failing for a whole week now. I see EOL is still 1 yr 9 mo off (https://cloud.google.com/compute/docs/eol/centos-eol-guidance), but I'm thinking we should drop this ancient platform anyway (release date was 2014-07-07, before pybind11 even existed). The centos builds are regular troublemakers, nibbling away more than their fair share from our time. Could we re-host PGI 22.3 on a less troublesome OS, ideally a modern one?

henryiii · 2022-10-19T22:19:32Z

CetnOS 7 is not failing, and it's a major and incredibly important platform. It's the base for manylinux2014. As the last LTS release of CentOS 7, it's still common for a while longer elsewhere, too. What's failing is the PGI compilers on CentOS 7. I'm quite fine to move the PGI build over to whatever else they support. We'll still have a (still working) CentOS 7 build. But I rather think that's not the problem, it's the files they are hosting for PGI.

rwgk · 2022-10-20T00:08:53Z

CetnOS 7 is not failing, and it's a major and incredibly important platform.

I see now, we have "centos:7" too. I only looked for "CentOS7" before. I never meant to suggest dropping something that's still working and doesn't limit anything at the moment.

What could be a good host OS to pick for PGI?

henryiii · 2022-10-20T00:49:55Z

CentOS 7? Given the issue is Nvidia stopped signing the binary, I don't think it matters what OS it's on. I think we just have to update from 22.3, current is 22.9.

ax3l · 2022-10-20T20:02:31Z

I think that's a good idea. I'll bump it in a PR to 22.7 (which we currently have at NERSC's Perlmutter supercomputer). But I am also generally ok to bump it higher if needed to address compiler bugs. Update, oh you did this already in #4260 👍

Skylion007 · 2022-10-21T15:44:41Z

One more minor update we could squeeze in: #4269

rwgk · 2022-10-21T21:11:34Z

Link to my concerns about rushing out a 2.10 release: #4271 (comment)

rwgk · 2022-10-22T00:28:12Z

@henryiii this it to acknowledge that I have received your DM in response to my comment:

I don't want to stop you from making the release, but based on what I know at the moment, I'd definitely want to remove my name from README.rst on the v2.10 branch.

Thank you for letting me know your assessment of my wish in such clear terms.

I'm open to changing my mind as I receive new information, but based on what I know at this minute, I still wish to distance myself form the 2.10.1 release. As I wrote before, I believe it distracts us — and many others — from making a clean move going to 2.11, resolving the ABI doubt, and flipping the default for the GIL management implementation.

I'm also open to alternative suggestions for distancing myself from the 2.10.1 release. Simply removing my name from the list of contributors was meant to be a practically effortless way of doing that. I hope that my simple wish can be respected in one way or another.

I will continue to look at PRs, related to the release or not, and comment if I feel I have something helpful to contribute, even though I don't agree with the goal of continuing the 2.10 release series.

henryiii · 2022-10-22T00:32:53Z

Even if we did skip the 2.10.1 release, I would insist the default for GIL management would not switch until after 2.11. We need at least one release to warn users that supported API is going to disappear, and give them time to react. Having a 2.10.1 gives us a way to a) let users know about the ABI uncertainty, b) let users respond to the upcoming API removal, and c) fix various issues with distribution, packaging, and compilers that are causing issues.

rwgk · 2022-10-22T00:52:21Z

is going to disappear

I'm OK to keep the non-simple code for a year+ as long as it is opt-in and not opt-out (even though it makes the pybind11 code quite a bit more complicated b/o the internals pointer).

Having to add a define is in all likelihood a much smaller effort than having to debug a complex production system running into one of the reported issues.

henryiii · 2022-10-22T01:33:30Z

I think we can drop it on the first release after the known projects adapt to using their own code. I just require it to be opt-out for one release (either 2.10.1 or 2.11.0, your choice). I have a 10K+ word essay on versioning caping, and a key takeaway is that in the Python ecosystem, following the example of CPython, we provide deprecation periods and do not try to break users immediately.

rwgk · 2022-10-22T08:19:09Z

provide deprecation periods

That makes sense for deprecating features already replaced by new ones, but it makes no sense to apply the same logic to deprecating traps, especially in our situation, when we know of just 2 cases that need to make a trivial adjustment to buy safety for everybody else.

Could you please give me a firm timeline for when the default will be flipped from unsafe to safe according to your plans?

henryiii · 2022-10-22T14:12:17Z

but it makes no sense to apply the same logic to deprecating traps

Do we have an example of anything it fixes? I setup and tested the issue it was supposed to fix and it didn't fix it. I'd like a concrete example of it fixing something before calling it "safe/unsafe" instead of "simple/custom".

I think deprecation in 2.10.1, switch default in 2.11 is reasonable. I like that it makes the non-PyPy compatible code opt-in, rather than making something you can easily create that then doesn't work on PyPy. I would really like to see a) both known packages either add the flag or the custom workaround, and b) a concrete example of how to do this custom workaround manually that we can place in the upgrade guide.

when we know of just 2 cases

It is not on me to find all possible uses. I can't see private code, I didn't check gitlab or bitbucket or even all of GitHub. There are over 100K uses of PYBIND11_MODULE on GitHub public code alone. Currently we know it might help one user (PyTorch) that has already worked around the issue, and it will break two significant packages. It won't fix currently broken code, it just will enable more flexible code in the future (assuming it does work).

rwgk · 2022-10-24T08:07:46Z

To add to my #4105 (comment) comment here:

My previous resistance to continuing the 2.10 release series was based on the assumption that incrementing the internals version will resolve the ABI incompatibility doubts. I no longer believe that. I think that we're up against a different kind of ABI issue, a bad interaction between #2999 and #1895. I also think that we may be able to find a way to handle or at least mitigate the situation, at a minimum with a clear explanation in the documentation. With that it would make sense to continue the 2.10 series, after we have full, agreed-on clarity.

Skylion007 · 2022-10-26T17:12:24Z

One other potential blocker. Should be a quick fix once we figure out the right behavior: #4288

henryiii · 2022-10-26T17:14:57Z

"something we haven't fixed but is present in the last release" is not a "blocker". If that was true, there would never be a release. Blockers are only issues that are not in a previous release. However, it's fine to include a quick fix if it makes it in and looks safe. But I object to it being called a blocker. :)

Skylion007 · 2022-10-26T17:19:50Z

@henryiii I suppose technically it could be a regression since the exception used to be in the ctor and therefore you could have a try catch around the error through which isn't viable anymore. Also I think part of this might be due to our more aggressive conversion to a std::string.

henryiii · 2022-10-26T17:26:10Z

In 2.10.0? Or after 2.10.0? It's only a blocker if it happened after 2.10.0. A blocker means the release is worse than the release before. After the release, a missed blocker becomes a regression. There is no requirement that we fix all regressions in the first point release. There can always be a second point release.

Again, happy to get it fixed if it's easy.

Skylion007 · 2022-10-26T17:36:20Z

Under those terms, it's a regression. Still a pretty nasty one though.

rwgk · 2022-10-31T00:23:19Z

CI for #4297 (fixes #4288) is running, once that's merged let's pull the trigger on 2.10.1?

@henryiii is there a chance that you could help with the cherry picks and changelog update? (I don't think I've ever done that, and I already lost a ton of time on #4216, with the deadlocks still hanging over my head.)

henryiii · 2022-10-31T01:48:31Z

Yes, I can do that, I should have time tomorrow or Tuesday. I've already done most of the cherry picks, I'll just need the last 1-2 PRs.

rwgk · 2022-10-31T19:55:39Z

Thanks a lot @henryiii!

henryiii changed the title ~~[PLAN] Tracking issue for 3.10.1~~ [PLAN] Tracking issue for 2.10.1 Aug 8, 2022

henryiii mentioned this issue Sep 20, 2022

Fix for Windows compiler, missing object initializer #4188

Merged

henryiii mentioned this issue Oct 7, 2022

[BUG]: error: lvalue required as increment operand #4100

Closed

3 tasks

henryiii pinned this issue Oct 20, 2022

rwgk mentioned this issue Oct 21, 2022

Strip down gil.h to the simple version only. #4218

Closed

EthanSteinberg mentioned this issue Oct 22, 2022

Fix casts to void* #4273

Closed

EthanSteinberg mentioned this issue Oct 22, 2022

Fix casts to void* #4275

Merged

This was referenced Oct 24, 2022

[BUG]: ABI incompatibility between 2.9.2 and 2.10 #4105

Closed

docs: prepare for 2.10.1 release #4279

Merged

EthanSteinberg mentioned this issue Oct 29, 2022

Fix functional.h bug + introduce test to verify that it is fixed #4254

Merged

henryiii closed this as completed in #4279 Oct 31, 2022

henryiii unpinned this issue Oct 31, 2022

rwgk mentioned this issue Feb 10, 2023

FWD pybind11 google/pybind11clif#4125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PLAN] Tracking issue for 2.10.1 #4125

[PLAN] Tracking issue for 2.10.1 #4125

henryiii commented Aug 8, 2022 •

edited

Loading

rwgk commented Oct 10, 2022

henryiii commented Oct 10, 2022

rwgk commented Oct 10, 2022 •

edited

Loading

henryiii commented Oct 10, 2022

henryiii commented Oct 10, 2022

rwgk commented Oct 10, 2022

henryiii commented Oct 11, 2022

henryiii commented Oct 11, 2022

ax3l commented Oct 11, 2022

rwgk commented Oct 11, 2022

rwgk commented Oct 18, 2022

rwgk commented Oct 19, 2022

henryiii commented Oct 19, 2022

rwgk commented Oct 20, 2022

henryiii commented Oct 20, 2022

ax3l commented Oct 20, 2022 •

edited

Loading

Skylion007 commented Oct 21, 2022

rwgk commented Oct 21, 2022

rwgk commented Oct 22, 2022

henryiii commented Oct 22, 2022

rwgk commented Oct 22, 2022

henryiii commented Oct 22, 2022

rwgk commented Oct 22, 2022

henryiii commented Oct 22, 2022

rwgk commented Oct 24, 2022

Skylion007 commented Oct 26, 2022

henryiii commented Oct 26, 2022

Skylion007 commented Oct 26, 2022 •

edited

Loading

henryiii commented Oct 26, 2022

Skylion007 commented Oct 26, 2022

rwgk commented Oct 31, 2022

henryiii commented Oct 31, 2022

rwgk commented Oct 31, 2022

[PLAN] Tracking issue for 2.10.1 #4125

[PLAN] Tracking issue for 2.10.1 #4125

Comments

henryiii commented Aug 8, 2022 • edited Loading

rwgk commented Oct 10, 2022

henryiii commented Oct 10, 2022

rwgk commented Oct 10, 2022 • edited Loading

henryiii commented Oct 10, 2022

henryiii commented Oct 10, 2022

rwgk commented Oct 10, 2022

henryiii commented Oct 11, 2022

henryiii commented Oct 11, 2022

ax3l commented Oct 11, 2022

rwgk commented Oct 11, 2022

rwgk commented Oct 18, 2022

rwgk commented Oct 19, 2022

henryiii commented Oct 19, 2022

rwgk commented Oct 20, 2022

henryiii commented Oct 20, 2022

ax3l commented Oct 20, 2022 • edited Loading

Skylion007 commented Oct 21, 2022

rwgk commented Oct 21, 2022

rwgk commented Oct 22, 2022

henryiii commented Oct 22, 2022

rwgk commented Oct 22, 2022

henryiii commented Oct 22, 2022

rwgk commented Oct 22, 2022

henryiii commented Oct 22, 2022

rwgk commented Oct 24, 2022

Skylion007 commented Oct 26, 2022

henryiii commented Oct 26, 2022

Skylion007 commented Oct 26, 2022 • edited Loading

henryiii commented Oct 26, 2022

Skylion007 commented Oct 26, 2022

rwgk commented Oct 31, 2022

henryiii commented Oct 31, 2022

rwgk commented Oct 31, 2022

henryiii commented Aug 8, 2022 •

edited

Loading

rwgk commented Oct 10, 2022 •

edited

Loading

ax3l commented Oct 20, 2022 •

edited

Loading

Skylion007 commented Oct 26, 2022 •

edited

Loading