-
-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test with BOLT #789
Comments
I will test tomorrow to build a clang toolchain with mold as linker, and then instrument and optimize it. I just tried to BOLT mold itself, compiled it with
|
@rui314 we are also eagerly trying to combine MOLD + CLANG + PGO + LTO + BOLT... it's certainly a journey to get there for big apps. |
@ptr1337 Thanks! I'd start with a "hello world" program though. |
Here is a brain dump of things that might affect the compatibility with BOLT.
|
@gcflymoto Is there anything we can help? |
I think this could help you, I have just tested to bolt clang.
|
@ishitatsuyuki Did you want to take this? |
@rui314 Yes, I'm tentatively trying to get a Rust CI build to work, which uses BOLT to optimize LLVM (rust-lang/rust#94381). I'll report if it succeeds or not. |
OK, it failed:
I'll try to gather more notes for reproduction later, but let me know if you need something in particular. |
Instrument runs fine with llvm |
If we use LLVM 16, can we use BOLT with mold without doing anything special? |
I still haven't tested the optimization pass (the part that is not instrumentation) yet. |
I think I spoke too soon. Bisection resulted in an all-pass so this might be an environment dependent problem (build flags? silent memory errors?). |
This line is failing, even in main: https://github.com/llvm/llvm-project/blob/3ee58e2f355f8fdb8e0fe29dc366c8833fafa7d3/bolt/lib/Rewrite/RewriteInstance.cpp#L1885 Lesson to me is to always turn on LLVM assertions when debugging. Without assertions this turned into a heisenbug. I'll try to see why. |
I usually build LLVM (and mold) with |
Looks like this was the real problem:
Looks PLT related. |
(Would love Rui's thoughts on this since I don't really know how mold's PLT differs from the psABI. FYI, the BOLT PLT scanner is here). |
Here is where we create a PLT header and PLT entries: https://github.com/rui314/mold/blob/main/elf/arch-x86-64.cc#L34-L75 The biggest difference compared to the one in the psABI is that our PLT entry has only one entry point. The standard PLT entry looks like this:
Initially, the .got.plt entry for symbol Our PLT looks like this:
Initially, unlike the standard PLT entries, all .got.plt entries point to the beginning of the PLT section. We unconditionally set %r11 to the PLT index. If the PLT entry is called for the first time, the PLT header uses %r11's value to know which PLT entry to fix. If a PLT entry is already resolved, it's a dead-store to %r11, but it should be harmless. |
After a close look it seems that the PLT format is actually fine since BOLT sequentially looks for the first jmp, which has the same target. It looks like this conflicts with preexisting local symbols like |
urgh, the |
I'm dumb, it was literally implemented in a7f1e20 as a feature... |
So, the PLT-related assertion is a bug in BOLT, but we might want to add a flag to disable the synthesized symbols, to allow workarounding this issue. There seems to be an unrelated regression from
(the A repro suite is attached. |
Instead of adding a command line option to mold to suppress the linker-synthesized symbols, I'd fix BOLT. BOLT should not be affected by these symbols. |
I can't reproduce the symbol visibility issue with git head, so maybe it's already fixed? |
Ack. Will file a bug report later.
Bisect says it's gone after d2b7b04. But that commit doesn't really touch visibility, and it looks like the exact corrupted symbols differs between builds, so I wonder if there's some nondeterminism involved here. After switching to v1.5.1 which doesn't emit the $plt symbols, BOLT is now complaining values calculated from relocations doesn't match the ones inside the executable:
|
With #798 and a hack to disable I used Rust's PGO build pipeline, which has full ThinLTO+PGO+BOLT for its LLVM component. I had to do a few quick hacks, namely changing the base distribution to get some decent support of C++20. The diff is available at https://github.com/ishitatsuyuki/rust/tree/moldBOLT. The resulting binary should also be running fine since the pipeline uses the just-built compiler to compile the ecosystem tools (Cargo, rust-analyzer etc.), but I'll just verify with a icache heatmap on my Intel laptop later, just in case. |
Great work! It's impressive that you identified and resolved the issue so quickly. Thank you! |
Great work! I just tested to instrument clang, and it has worked. Gathering now the profile, then the optimization process will be done. So far it seems to be working well.
The only thing, which seems different then with lld to me is:
Same as this, seems to me a bit high:
Thanks! edit: Instrumentation, merging fdata and the optimization was successful!
|
@ishitatsuyuki You may want to test BOLT with a program that uses exceptions, as clang and (I believe) rustc don't use exceptions. That part might not be tested yet. |
BOLT doesn't need |
Oh cool, then we don't need to synthsize .rela.eh_frame. Exceptions could fail for other reasons, so it's nice to test it though. |
The |
@ishitatsuyuki I'd file that to BOLT even if it's harmless. |
The real-life example of BOLT seems really scarce, so I don't have a good idea what to test. I'll run BOLT's own test suite in LLVM with mold which has some CFI tests.
Ack. |
Maybe @maksfb has an idea as to what programs are suitable for testing BOLT? |
BOLT test suite has several runnable x86 tests for C++ exceptions. E.g. https://github.com/llvm/llvm-project/blob/main/bolt/test/runtime/X86/exceptions-run.test For large open-source project with C++ exceptions, I can recommend trying HHVM (hhvm.com). I've also heard that ScyllaDB uses BOLT, but I have no first-hand experience with it. |
I ran the BOLT test suite with mold ( Failed Tests (10):
Logs for X86/dynrelocs.s:
|
@rui314 Any idea where a GOT32 reloc can come from? You can see it in e.g. the emit-relocs test. I readelf'd all the input .o and .a files but they didn't seem to contain any. A quick grep of mold code don't show any uses that assigns the value of GOT32, so I don't have a good clue here. In bfd, the symbols seems to get assigned a PC64 reloc instead, although I'm not 100% sure it's pointing to the same thing. Finally there's a chance that GOT32 is just a bogus value we assign due to some bug. I see a bunch of these for .debug_lines too. |
It looks like mold creates GOT32 relocations for section symbols. We discard section symbols in input files and re-create new section symbols for output sections, as we want to keep only one section symbol for each section. I don't know why it lead to creating GOT32 relocations, though. But the relocation type looks bogus. |
OK, this is the culprit then 😅
(STT_SECTION = R_X86_64_GOT32) |
LOL |
With #833 |
There are still 9 failures but I think we're mostly done for it now. Reasons it's failing seems to be due to one of: 1. mold defaulting to LOCAL for executables (need |
Facebook/Meta's BOLT is becoming popular. I don't know if BOLT works on mold's output. We need to test it and fix issues if any.
The text was updated successfully, but these errors were encountered: