-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize Iterator for &mut I where I: Sized #82185
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
Is the intermediate trait SpecSizedIterator a workaround? Couldn't it be specialized on the Sized bound for the plain |
Current std policy says to use private helper traits. https://std-dev-guide.rust-lang.org/code-considerations/using-unstable-lang/specialization.html |
r? @cuviper (or someone else more familiar with iterator code) |
Let's check the general perf effect: @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit a09399b1f1ba90d72fdc4eadf65d0ab459a96287 with merge 4bf6a12cc7384e44f61e55b762c4efb6612c5aca... |
☀️ Try build successful - checks-actions |
Queued 4bf6a12cc7384e44f61e55b762c4efb6612c5aca with parent d2b38d6, future comparison URL. |
Finished benchmarking try commit (4bf6a12cc7384e44f61e55b762c4efb6612c5aca): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@SkiFire13 those perf results don't look promising -- what's your take on it? |
I'm not quite sure but my guesses are:
|
@SkiFire13 Ping from triage, what's next steps here? |
After the perf run I decided to focus on solving the regressing benchmarks, hoping that their cause is the same as the regressions in the perf run. I tried inling more methods and changing |
If you click on the percentages you'll see a breakdown by what's consuming more time. It's mostly spending more time in LLVM, so one thing you could investigate is whether these changes caused more llvm IR to be emitted for those benchmarked crates. cargo llvm-lines can be useful there. In the past optimization attempts in
Just run the experiment? Another idea would be measuring the shuffling of the default implementations into separate traits ( |
@SkiFire13 Ping from triage, any updates on this? |
…&mut impl Iterator
…old implementations for &mut impl DoubleEndedIterator
…o common functions
03cf754
to
17350de
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 17350de with merge 5d8726c5e6e2c8a635a0178c21932f6634271e89... |
☀️ Try build successful - checks-actions |
Queued 5d8726c5e6e2c8a635a0178c21932f6634271e89 with parent 2bafe96, future comparison URL. |
Finished benchmarking try commit (5d8726c5e6e2c8a635a0178c21932f6634271e89): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Still pretty bad results |
The clap-rs-opt full and clap-rs-opt incr-patched: println benchmarks still are spending additional time in llvm, as they did in the previous perf run. And they're way outside the noise level in the perf charts. So that's probably unrelated to Running that benchmark locally for master and for the branch and comparing the llvm-lines output might help. Maybe there are lots of generic instantiations of iterator code somewhere else. deeply-nested-async-opt incr-unchanged is interesting in a different way. It spends more time in rust passes, not llvm. That might be a genuine regression in iterator performance. Running that benchmark with |
…imulacrum replace vec::Drain drop loops with drop_in_place The `Drain::drop` implementation came up in rust-lang#82185 (comment) as potentially interfering with other optimization work due its widespread use somewhere in `println!` `@rustbot` label T-libs-impl
@SkiFire13 any updates on this? |
I left this PR in draft mode in case I got new ideas but nothing came up and eventually I forgot about it. |
This PR aims to improve the implementation of
Iterator
andDoubleEndedIterator
for&mut I where I: Iterator
. This is accomplished by forwardingtry_fold
to the implementation forI
(since it takes&mut self
) and implementingfold
in terms oftry_fold
(sincefold
takesself
).Since
try_fold
is not available ifI: !Sized
I needed to specialize the caseI: Sized
(that's also why I choose to keep the+ Sized
bound, to be explicit about what I'm specializing). This means this optimization won't apply to&mut dyn Iterator
.While coding this pr I realized that it adds a considerable chunk of code for
&mut I where I: Iterator
and it may make sense to move it to a separate file just like other adapters. Tell me what you think about it.Edit: Note that the following benchmark results are relatively old (they were done before the LLVM 12 update for instance) and should probably be updated.
Here's a comparison (run with
cargo benchcmp master pr --threshold 2
) of the benchmark (run withpy x.py bench --stage 0 library/core --test-args iter::
):There are a lot of promising improvements, but surprisingly there are also a couple of strange regressions:
These benchmarks shouldn't have be impacted by my changes because they never use
&mut impl Iterator
but somehow they were. My guess is that this is the result of some LLVM shenanigans because if I remove any benchmark containing.by_ref()
then most of them don't regress anymore. Since these look like codegen problems and specific to the current benchmark I'm not too worried about them.These benchmarks however were directly impacted by my changes and the give the same results even if I remove all the other benchmarks, so I guess they're genuine regressions.
All these regressions are present even if I copy the default
fold
implementation but directly callI::next
instead of passing through<&mut I>::next
(which should just callI::next
anyway). This makes me think these are also codegen problem.I would like to not have these regressions but the other improvements probably outweights them.