Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust baseline compiled assembly difference #6

Open
rpjohnst opened this issue Aug 13, 2024 · 2 comments
Open

Rust baseline compiled assembly difference #6

rpjohnst opened this issue Aug 13, 2024 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@rpjohnst
Copy link

From the README:

(It's not entirely clear why the Zig baseline implementation is twice as fast as the Rust implementation. The compiled assembly (godbolt) show that Rust saves five registers on the stack while Zig only saves three, but why? For the purpose of this benchmark it shouldn't matter since we're only comparing against the baseline of each language.)

The difference is that the linked Zig program produces an internal LLVM function, which can call itself directly, while the Rust program produces a non-internal LLVM function, which calls itself through the GOT. If you mark the Rust function non-pub and call it from a pub function (like the Zig main), you will get essentially the same assembly: https://godbolt.org/z/x73v9zKb9

@judofyr
Copy link
Owner

judofyr commented Aug 13, 2024

Oooh, that's interesting! Let's see if this has an impact on the benchmarks results as well. Initially I had the function in the same file as the benchmark, but I can't remember if it was pub or not. I'll see if I get different results by putting it directly inside the benchmark file and marking it non-pub.

I guess the overhead between pub and non-pub is very small since they still follow the optimized calling convention? Is there any documentation around this somewhere I could read up on?

@judofyr judofyr pinned this issue Aug 13, 2024
@rpjohnst
Copy link
Author

rpjohnst commented Aug 13, 2024

It looks like even pub items can become internal when the final binary artifact is linked the right way- Godbolt probably just so happens to be configured to produce an artifact type meant to be dynamically link(able/ed). For example, on my machine (macOS) a pub function in a binary/executable crate type gets generated as internal.

I would not expect to find a lot of docs on this, because rustc seems to just choose the best linkage given the artifact type it's generating, and this can change wildly based on all kinds of factors - which OS, where in the dependency graph the function is compiled (which itself depends on whether it is generic, inlinable, the optimization level, flags like -Z share-generics, etc), static vs dynamic linking, and so on.

@judofyr judofyr added the help wanted Extra attention is needed label Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants