Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safer implementation of RepeatN #130887

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Safer implementation of RepeatN #130887

wants to merge 1 commit into from

Conversation

Soveu
Copy link
Contributor

@Soveu Soveu commented Sep 26, 2024

I've seen the "Use MaybeUninit for RepeatN" commit while reading This Week In Rust and immediately thought about something I've written some time ago - https://github.com/Soveu/repeat_finite/blob/master/src/lib.rs.

Using the fact, that Option will find niche in (T, NonZeroUsize), we can construct something that has the same size as (T, usize) while completely getting rid of MaybeUninit.
This leaves only unsafe on TrustedLen, which is pretty neat.

@rustbot
Copy link
Collaborator

rustbot commented Sep 26, 2024

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 26, 2024
@rust-log-analyzer

This comment has been minimized.

@Soveu
Copy link
Contributor Author

Soveu commented Sep 26, 2024

huh, I'm no llvm expert, buuut...

          8: ; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: readwrite) uwtable 
           9: define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr #0 { 
          10: start: 
          11:  %0 = load i64, ptr %it, align 8, !alias.scope !3, !noundef !6 
          12:  %1 = icmp eq i64 %0, 0 
          13:  br i1 %1, label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit", label %bb8.i 
          14:  
          15: bb8.i: ; preds = %start 
          16:  %_10.0.i = add i64 %0, -1 
          17:  %2 = icmp eq i64 %_10.0.i, 0 
          18:  br i1 %2, label %bb5.i, label %bb3.i 
          19:  
          20: bb3.i: ; preds = %bb8.i 
          21:  store i64 %_10.0.i, ptr %it, align 8, !alias.scope !3 
  next:29      !~~~~~~~~~~~~~~~~~                                     error: match on wrong line
          22:  %_14.i = getelementptr inbounds i8, ptr %it, i64 8 
          23:  %_14.val.i = load i16, ptr %_14.i, align 8, !alias.scope !3, !noundef !6 
          24:  br label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit" 
          25:  
          26: bb5.i: ; preds = %bb8.i 
          27:  %3 = getelementptr inbounds i8, ptr %it, i64 8 
          28:  %4 = load i16, ptr %3, align 8, !alias.scope !7 
          29:  store i64 0, ptr %it, align 8, !alias.scope !7 
          30:  br label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit" 
          31:  
          32: "_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit": ; preds = %start, %bb3.i, %bb5.i 
          33:  %_0.sroa.5.0.i = phi i16 [ %4, %bb5.i ], [ %_14.val.i, %bb3.i ], [ undef, %start ] 
          34:  %_0.sroa.0.0.i = phi i16 [ 1, %bb5.i ], [ 1, %bb3.i ], [ 0, %start ] 
          35:  %5 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0.i, 0 
          36:  %6 = insertvalue { i16, i16 } %5, i16 %_0.sroa.5.0.i, 1 
          37:  ret { i16, i16 } %6 
          38: } 

bb3.i and bb5.i if not for the store i64 they would look identical.

The other functions (vec_extend_via_repeat_n and array_repeat_not_copy) look ok.

@slanterns
Copy link
Contributor

Maybe you can ping scottmcm (test author) to see if the change in codegen result is acceptable (then adjust the test.)

@Dylan-DPC
Copy link
Member

@Soveu any updates on the failed test? thanks

@Dylan-DPC Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 28, 2024
@Soveu
Copy link
Contributor Author

Soveu commented Nov 29, 2024

@Soveu any updates on the failed test? thanks

I guess we don't want to introduce additional branches in the generated code, so for "production" code this implementation would be unfit. @scottmcm could comment on that.

@scottmcm
Copy link
Member

It is pretty neat, and in fact it's the implementation I originally wrote for this.

The biggest issue I had with it then, though, is that it makes the iterator be potentially-undef, but because of the #130141 bug (which reminds me to go push on rust-lang/rfcs#3712 ) we're stuck with that problem anyway, so there's probably a way you could make this work.


Looks like the problem is that it's having trouble figuring out that the store of zero and the store of count-1 can be merged, because otherwise there'd be no difference.

Basically, what this test is testing is that for !needs_drop types we don't need to consider the last iteration separately, because the difference between them is whether we need to call drop, but since we don't need to do that for !needs_drop it's not an issue, and thus we should be able to avoid that second branch. (We of course need the "is the iterator empty?" branch, but once we know it's not empty there should be no further branches needed for !needs_drop types.)

Nothing immediately jumps to mind for how to get LLVM to know that, though. Maybe try a separate if const { mem::needs_drop::<T>() } path that could avoid the take, and see if that helps? Or maybe there's a way to phrase this such that you compute the value to write in a local, then there's just one write of it back to self so LLVM just needs to merge the values instead of merging the stores?

I'd suggest copying the implementation to godbolt and playing around a bit there until you find an incantation that works.

@Soveu
Copy link
Contributor Author

Soveu commented Nov 29, 2024

I have tried some things when first writing this code, with no effect, but the idea with mem::needs_drop made me try again. I was able to make the branches very similar, making LLVM merge them :) Technically there is still a case, where clone is more expensive than move, but the type does not have drop(). Probably it doesn't happen often, but might need consideration.

    fn next(&mut self) -> Option<A> {
        let inner = self.inner.as_mut()?;
        let count = inner.count.get();

        if let Some(new_count) = NonZero::<usize>::new(count - 1) {
            let tmp = inner.element.clone();
            inner.count = new_count;
            return Some(tmp);
        }

        return if core::mem::needs_drop::<A>() {
            self.take_element()
        } else {
            let tmp = inner.element.clone();
            self.inner = None;
            Some(tmp)
        };
    }

Godbolt: https://rust.godbolt.org/z/Kr198x635

LLVM output:

define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr personality ptr @rust_eh_personality {
start:
  %0 = getelementptr inbounds i8, ptr %it, i64 8
  %1 = load i64, ptr %0, align 8
  %2 = icmp eq i64 %1, 0
  br i1 %2, label %"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit", label %bb13.i

bb13.i:
  %count.i = add i64 %1, -1
  %3 = load i16, ptr %it, align 8
  store i64 %count.i, ptr %0, align 8
  br label %"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit"

"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit":
  %_0.sroa.6.0.i = phi i16 [ undef, %start ], [ %3, %bb13.i ]
  %_0.sroa.0.0.i = phi i16 [ 0, %start ], [ 1, %bb13.i ]
  %4 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0.i, 0
  %5 = insertvalue { i16, i16 } %4, i16 %_0.sroa.6.0.i, 1
  ret { i16, i16 } %5
}

declare noundef range(i32 0, 10) i32 @rust_eh_personality(i32 noundef, i32 noundef range(i32 1, 17), i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

LLVM output for the current std implementation:

define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr {
start:
  %_3 = load i64, ptr %it, align 8
  %_2.not = icmp eq i64 %_3, 0
  br i1 %_2.not, label %bb4, label %bb1

bb1:
  %_2.i = add i64 %_3, -1
  store i64 %_2.i, ptr %it, align 8
  %0 = getelementptr inbounds i8, ptr %it, i64 8
  %result.i = load i16, ptr %0, align 8
  br label %bb4

bb4:
  %_0.sroa.3.0 = phi i16 [ %result.i, %bb1 ], [ undef, %start ]
  %_0.sroa.0.0 = phi i16 [ 1, %bb1 ], [ 0, %start ]
  %1 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0, 0
  %2 = insertvalue { i16, i16 } %1, i16 %_0.sroa.3.0, 1
  ret { i16, i16 } %2
}

From what I see, %0 = getelementptr inbounds i8, ptr %it, i64 8 gets moved to the top

@Soveu
Copy link
Contributor Author

Soveu commented Nov 29, 2024

That is interesting, just switching variables made it work 🤔

    fn next(&mut self) -> Option<A> {
        let inner = self.inner.as_mut()?;
        let count = inner.count.get();

        if let Some(new_count) = NonZero::<usize>::new(count - 1) {
            // This must be in this order!
            let tmp = inner.element.clone();
            inner.count = new_count;
            return Some(tmp);
        }

        return self.take_element();
    }

@scottmcm
Copy link
Member

just switching variables made it work 🤔

Definitely odd, but that's why we have codegen tests :P

@rust-log-analyzer

This comment has been minimized.

@scottmcm
Copy link
Member

From what I see, %0 = getelementptr inbounds i8, ptr %it, i64 8 gets moved to the top

What that means is that it ended up getting a different layout -- originally it was storing the count at address, but apparently it flipped the order.

I don't have a strong feeling about which order is better, but if you want you could put #[repr(C)] // not stable just keeps the layout consistent for codegen tests on RepeatNInner to make it give you the order you want.

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 30, 2024
…epeatn, r=<try>

Use `iter::repeat_n` to implement `Vec::extend_with`

This replaces the `Vec::extend_with` manual implementation, which is used by `Vec::resize` and `Vec` `SpecFromElem`, with `iter::repeat_n`.

I've compared the codegen output between:

1. the current `Vec::resize` impl
2. this branch
3. this branch + rust-lang#130887

3 gives the closest codegen output to 1, with some output improvements. 2 doesn't look good: https://rust.godbolt.org/z/Yrc83EhjY.
May also help rust-lang#120050?
@Soveu
Copy link
Contributor Author

Soveu commented Nov 30, 2024

@scottmcm @Amanieu its ready

@Soveu
Copy link
Contributor Author

Soveu commented Nov 30, 2024

@rustbot label +S-waiting-on-review -S-waiting-on-author

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 30, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 1, 2024
…epeatn, r=<try>

Use `iter::repeat_n` to implement `Vec::extend_with`

This replaces the `Vec::extend_with` manual implementation, which is used by `Vec::resize` and `Vec` `SpecFromElem`, with `iter::repeat_n`.

I've compared the codegen output between:

1. the current `Vec::resize` impl
2. this branch
3. this branch + rust-lang#130887

3 gives the closest codegen output to 1, with some output improvements. 2 doesn't look good: https://rust.godbolt.org/z/Yrc83EhjY.
May also help rust-lang#120050?

---

WARNING: DO NOT MERGE - in order to run the perf run in rust-lang#133662 (comment) this PR currently also contains commits from rust-lang#130887
@paolobarbolini
Copy link
Contributor

FYI we're also playing around with this change at #133662 since Compiler Explorer showed that the two MRs combined made Vec::resize slightly better. We'll see how the perf run goes.

@Folyd
Copy link
Contributor

Folyd commented Dec 1, 2024

Nice, this reminds me of my article on Rust Magazine: VecDeque::resize() optimization
.

@tgross35 tgross35 changed the title 100% safe implementation of RepeatN Safer implementation of RepeatN Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants