Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test with BOLT #789

Open
rui314 opened this issue Oct 17, 2022 · 44 comments
Open

Test with BOLT #789

rui314 opened this issue Oct 17, 2022 · 44 comments

Comments

@rui314
Copy link
Owner

rui314 commented Oct 17, 2022

Facebook/Meta's BOLT is becoming popular. I don't know if BOLT works on mold's output. We need to test it and fix issues if any.

@ptr1337
Copy link

ptr1337 commented Oct 17, 2022

I will test tomorrow to build a clang toolchain with mold as linker, and then instrument and optimize it.
I will report it here.

I just tried to BOLT mold itself, compiled it with --emit-relocs and -fno-reorder-blocks-and-partition.
When trying to create the instrumented binary, I get a Segmentation fault.

❯ ./bolt-anything.bash
Instrument binary with llvm-bolt
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x1000000, offset 0x1000000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
/home/ptr1337/llvm-bolt-scripts/bolt-anything.bash: line 41: 37593 Segmentation fault      ${BOLTPATH}/llvm-bolt --instrument --instrumentation-file-append-pid --instrumentation-file=${FDATA}/${BINARY}.fdata ${BINARYPATH}/${BINARY} -o ${BOLTBIN}/${BINARY}
Could not create instrumented binary

@gcflymoto
Copy link

@rui314 we are also eagerly trying to combine MOLD + CLANG + PGO + LTO + BOLT... it's certainly a journey to get there for big apps.

@rui314
Copy link
Owner Author

rui314 commented Oct 18, 2022

@ptr1337 Thanks! I'd start with a "hello world" program though.

@rui314
Copy link
Owner Author

rui314 commented Oct 18, 2022

Here is a brain dump of things that might affect the compatibility with BOLT.

  • Our .eh_frame section does not have a relocation section even if --emit-relocs is given. We synthesize .eh_frame, and we probably need to synthesize .rela.eh_frame too
  • Relocation sections emitted when --emit-relocs is given is always in the RELA format even for a target with REL type. This is not a bug but a feature, as it should make binary post-processing easier. I don't think it's causing an issue, as x86-64 and aarch64 are both RELA, though.
  • Our .plt and .plt.got machine instructions are different from the ones in the psABIs.

@rui314
Copy link
Owner Author

rui314 commented Oct 18, 2022

@gcflymoto Is there anything we can help?

@ptr1337
Copy link

ptr1337 commented Oct 18, 2022

@rui314

I think this could help you, I have just tested to bolt clang.

Instrument clang with llvm-bolt
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x4e00000, offset 0x4e00000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/home/ptr1337/llvm-bolt-scripts/build_stage3-bolt-without-sampling.bash: line 19: 69152 Aborted                 ${BOLTPATH}/llvm-bolt --instrument --instrumentation-file-append-pid --instrumentation-file=${TOPLEV}/stage3-without-sampling/intrumentdata

@rui314
Copy link
Owner Author

rui314 commented Oct 19, 2022

@ishitatsuyuki Did you want to take this?

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 19, 2022

@rui314 Yes, I'm tentatively trying to get a Rust CI build to work, which uses BOLT to optimize LLVM (rust-lang/rust#94381). I'll report if it succeeds or not.

@ishitatsuyuki
Copy link
Contributor

OK, it failed:

BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: <unknown>
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x5000000, offset 0x5000000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x4a4f843) does not have any section
BOLT-WARNING: etext/1 (0x4a4f843) does not have any section
BOLT-WARNING: split function detected on input : _ZNSi6ignoreEl.cold/1. The support is limited in relocation mode.
 #0 0x0000000000958a1f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000000000956814 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fef06d83db0 __restore_rt (/lib64/libc.so.6+0x59db0)
 #3 0x00007fef06e0ffc0 __memmove_avx_unaligned_erms (/lib64/libc.so.6+0xe5fc0)
 #4 0x000000000099ded9 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) (.constprop.0) RewriteInstance.cpp:0:0
 #5 0x00000000009d959b llvm::bolt::RewriteInstance::analyzeRelocation(llvm::object::RelocationRef const&, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, bool&, unsigned long&, long&, unsigned long&, bool&) const (/rustroot/bin/llvm-bolt+0x9d959b)
 #6 0x00000000009d9e6b llvm::bolt::RewriteInstance::readRelocations(llvm::object::SectionRef const&) (/rustroot/bin/llvm-bolt+0x9d9e6b)
 #7 0x00000000009dbdcd llvm::bolt::RewriteInstance::processRelocations() (.part.0) RewriteInstance.cpp:0:0
 #8 0x00000000009fa56a llvm::bolt::RewriteInstance::discoverFileObjects() (/rustroot/bin/llvm-bolt+0x9fa56a)
 #9 0x00000000009fbf49 llvm::bolt::RewriteInstance::run() (/rustroot/bin/llvm-bolt+0x9fbf49)
#10 0x0000000000419bab main (/rustroot/bin/llvm-bolt+0x419bab)
#11 0x00007fef06d6ee50 __libc_start_call_main (/lib64/libc.so.6+0x44e50)
#12 0x00007fef06d6eefc __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x44efc)
#13 0x000000000047baa5 _start (/rustroot/bin/llvm-bolt+0x47baa5)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-bolt -instrument /checkout/obj/build/x86_64-unknown-linux-gnu/llvm/build/lib/libLLVM-15-rust-1.66.0-nightly.so --instrumentation-file-append-pid -o /tmp/instrumented.so

I'll try to gather more notes for reproduction later, but let me know if you need something in particular.

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 19, 2022

Instrument runs fine with llvm main so it's some fun bug that got fixed between the 15.0 and 16.0 cycle...

@rui314
Copy link
Owner Author

rui314 commented Oct 19, 2022

If we use LLVM 16, can we use BOLT with mold without doing anything special?

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 19, 2022

I still haven't tested the optimization pass (the part that is not instrumentation) yet.

@ishitatsuyuki
Copy link
Contributor

I think I spoke too soon. Bisection resulted in an all-pass so this might be an environment dependent problem (build flags? silent memory errors?).

@ishitatsuyuki
Copy link
Contributor

This line is failing, even in main: https://github.com/llvm/llvm-project/blob/3ee58e2f355f8fdb8e0fe29dc366c8833fafa7d3/bolt/lib/Rewrite/RewriteInstance.cpp#L1885

Lesson to me is to always turn on LLVM assertions when debugging. Without assertions this turned into a heisenbug.

I'll try to see why.

@rui314
Copy link
Owner Author

rui314 commented Oct 19, 2022

I usually build LLVM (and mold) with -DCMAKE_RELEASE_TYPE=Debug unless I'm doing performance optimization.

@ishitatsuyuki
Copy link
Contributor

Looks like this was the real problem:

llvm-bolt: /home/ishitatsuyuki/llvm-project/bolt/lib/Core/BinaryContext.cpp:753: llvm::bolt::BinaryFunction *llvm::bolt::BinaryContext::createBinaryFunction(const std::string &, llvm::bolt::BinarySection &, uint64_t, uint64_t, uint64_t, uint16_t): Assertion `Result.second == true && "unexpected duplicate function"' failed.
 #0 0x000055839b75737a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
 #1 0x000055839b75752b PrintStackTraceSignalHandler(void*) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
 #2 0x000055839b755b76 llvm::sys::RunSignalHandlers() /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Signals.cpp:104:5
 #3 0x000055839b757c55 SignalHandler(int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007fded65a2520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007fded65f6a7c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #6 0x00007fded65f6a7c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #7 0x00007fded65f6a7c pthread_kill ./nptl/pthread_kill.c:89:10
 #8 0x00007fded65a2476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #9 0x00007fded65887f3 abort ./stdlib/abort.c:81:7
#10 0x00007fded658871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#11 0x00007fded6599e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#12 0x000055839d0e9105 llvm::bolt::BinaryContext::createBinaryFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, llvm::bolt::BinarySection&, unsigned long, unsigned long, unsigned long, unsigned short) /home/ishitatsuyuki/llvm-project/bolt/lib/Core/BinaryContext.cpp:754:32
#13 0x000055839b872746 llvm::bolt::RewriteInstance::createPLTBinaryFunction(unsigned long, unsigned long, unsigned long) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1268:6
#14 0x000055839b873002 llvm::bolt::RewriteInstance::disassemblePLTSectionX86(llvm::bolt::BinarySection&, unsigned long) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1386:3
#15 0x000055839b873220 llvm::bolt::RewriteInstance::disassemblePLT()::$_9::operator()(llvm::bolt::BinarySection&, unsigned long) const /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1394:3
#16 0x000055839b871838 llvm::bolt::RewriteInstance::disassemblePLT() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1401:5
#17 0x000055839b86c4dc llvm::bolt::RewriteInstance::discoverFileObjects() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1173:3
#18 0x000055839b868952 llvm::bolt::RewriteInstance::run() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:720:3
#19 0x000055839ac9b76b main /home/ishitatsuyuki/llvm-project/bolt/tools/driver/llvm-bolt.cpp:244:17
#20 0x00007fded6589d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#21 0x00007fded6589e40 call_init ./csu/../csu/libc-start.c:128:20
#22 0x00007fded6589e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#23 0x000055839ac9a6d5 _start (./build/bin/llvm-bolt+0x2326d5)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./build/bin/llvm-bolt -instrument /home/ishitatsuyuki/rust/obj/build/x86_64-unknown-linux-gnu/llvm/build/lib/libLLVM-15-rust-1.66.0-nightly.so -o /tmp/out.so

Looks PLT related.

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 19, 2022

(Would love Rui's thoughts on this since I don't really know how mold's PLT differs from the psABI. FYI, the BOLT PLT scanner is here).

@rui314
Copy link
Owner Author

rui314 commented Oct 19, 2022

Here is where we create a PLT header and PLT entries: https://github.com/rui314/mold/blob/main/elf/arch-x86-64.cc#L34-L75

The biggest difference compared to the one in the psABI is that our PLT entry has only one entry point. The standard PLT entry looks like this:

jmp *GOTPLT_ETNRY_FOR_THE_SYMBOL
push PLT_INDEX
jmp PLT_HDR

Initially, the .got.plt entry for symbol foo is set to the middle of foo's PLT entry, so that when the PLT entry is executed for the first time, it jumps back to the push instruction. The push instruction sets a PLT index and jump to the beginning of the PLT header for lazy symbol resolution.

Our PLT looks like this:

endbr64
mov PLT_INDEX, %r11
jmp *GOTPLT_ENTRY_FOR_THE_SYMBOL

Initially, unlike the standard PLT entries, all .got.plt entries point to the beginning of the PLT section. We unconditionally set %r11 to the PLT index. If the PLT entry is called for the first time, the PLT header uses %r11's value to know which PLT entry to fix. If a PLT entry is already resolved, it's a dead-store to %r11, but it should be harmless.

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 19, 2022

After a close look it seems that the PLT format is actually fine since BOLT sequentially looks for the first jmp, which has the same target.

It looks like this conflicts with preexisting local symbols like foo$plt though, which the assertion is complaining about: I need to understand why these exist and what are these supposed to be.

@ishitatsuyuki
Copy link
Contributor

urgh, the foo$plt thing is a regression after 1.5.1. I need to bisect.

@ishitatsuyuki
Copy link
Contributor

I'm dumb, it was literally implemented in a7f1e20 as a feature...

@ishitatsuyuki
Copy link
Contributor

So, the PLT-related assertion is a bug in BOLT, but we might want to add a flag to disable the synthesized symbols, to allow workarounding this issue.

There seems to be an unrelated regression from a7f1e20 too: some symbol's visibility gets filled with garbage value like below:

Symbol table '.dynsym' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fprintf@GLIBC_2.2.5 (2)
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strtol@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND stdin@GLIBC_2.2.5 (2)
     4: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND stderr@GLIBC_2.2.5 (2)
     5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fread@GLIBC_2.2.5 (2)
     6: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND feof@GLIBC_2.2.5 (2)

Symbol table '.symtab' contains 62 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000200238     0 SECTION LOCAL  DEFAULT    1 .hash
     2: 0000000000200278     0 SECTION LOCAL  DEFAULT    2 .gnu.hash
     3: 0000000000200298     0 SECTION LOCAL  DEFAULT    3 .dynsym
     4: 0000000000200340     0 SECTION LOCAL  DEFAULT    4 .dynstr
     5: 00000000002003ba     0 SECTION LOCAL  DEFAULT    5 .gnu.version
     6: 00000000002003c8     0 SECTION LOCAL  DEFAULT    6 .gnu.version_r
     7: 00000000002003e8     0 SECTION LOCAL  DEFAULT    7 .rela.dyn
     8: 0000000000200418     0 SECTION LOCAL  DEFAULT    8 .rela.plt
     9: 0000000000200478     0 SECTION LOCAL  DEFAULT    9 .eh_frame
    10: 00000000002004e4     0 SECTION LOCAL  DEFAULT   10 .eh_frame_hdr
    11: 0000000000200500     0 SECTION LOCAL  DEFAULT   11 .rodata.cst
    12: 0000000000200520     0 SECTION LOCAL  DEFAULT   12 .rodata.str
    13: 00000000002015a0     0 SECTION LOCAL  DEFAULT   13 .plt
    14: 0000000000201600     0 SECTION LOCAL  DEFAULT   14 .text
    15: 0000000000202800     0 SECTION LOCAL  DEFAULT   15 .dynamic
    16: 0000000000202a20     0 SECTION LOCAL  DEFAULT   16 .got
    17: 0000000000202a30     0 SECTION LOCAL  DEFAULT   17 .relro_padding
    18: 0000000000203a30     0 SECTION LOCAL  DEFAULT   18 .got.plt
    19: 0000000000202a20     0 OBJECT  LOCAL  DEFAULT   16 stdin@got
    20: 0000000000202a28     0 OBJECT  LOCAL  DEFAULT   16 stderr@got
    21: 00000000002015c0     0 FUNC    LOCAL  DEFAULT   13 fprintf@plt
    22: 00000000002015d0     0 FUNC    LOCAL  DEFAULT   13 strtol@plt
    23: 00000000002015e0     0 FUNC    LOCAL  DEFAULT   13 fread@plt
    24: 00000000002015f0     0 FUNC    LOCAL  DEFAULT   13 feof@plt
    25: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS count.c
    26: 0000000000200510     0 NOTYPE  LOCAL  DEFAULT   11 .LCPI0_0
    27: 0000000000200500     0 NOTYPE  LOCAL  DEFAULT   11 .LCPI0_1
    28: 0000000000200576     0 OBJECT  LOCAL  DEFAULT   12 .L.str
    29: 0000000000200520     0 OBJECT  LOCAL  DEFAULT   12 .L.str.2
    30: 000000000020055a     0 OBJECT  LOCAL  DEFAULT   12 .L.str.3
    31: 0000000000200539     0 OBJECT  LOCAL  DEFAULT   12 .L.str.1
    32: 0000000000201600   508 FUNC    LOCAL  DEFAULT   14 main
    33: 0000000000200000     0 NOTYPE  LOCAL  DEFAULT    1 __ehdr_start
    34: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __init_array_start
    35: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __init_array_end
    36: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __fini_array_start
    37: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __fini_array_end
    38: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __preinit_array_start
    39: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __preinit_array_end
    40: 0000000000202800     0 NOTYPE  LOCAL  DEFAULT   15 _DYNAMIC
    41: 0000000000203a30     0 NOTYPE  LOCAL  DEFAULT   18 _GLOBAL_OFFSET_TABLE_
    42: 00000000002015a0     0 NOTYPE  LOCAL  DEFAULT   13 _PROCEDURE_LINKAGE_TABLE_
    43: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __bss_start
    44: 0000000000203a68     0 NOTYPE  LOCAL  DEFAULT   18 _end
    45: 00000000002017fc     0 NOTYPE  LOCAL  DEFAULT   14 _etext
    46: 0000000000203a68     0 NOTYPE  LOCAL  DEFAULT   18 _edata
    47: 0000000000200000     0 NOTYPE  LOCAL  DEFAULT    1 __executable_start
    48: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __rela_iplt_start
    49: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   21 __rela_iplt_end
    50: 00000000002004e4     0 NOTYPE  LOCAL  DEFAULT   10 __GNU_EH_FRAME_HDR
    51: 0000000000203a68     0 NOTYPE  LOCAL  DEFAULT   18 end
    52: 00000000002017fc     0 NOTYPE  LOCAL  DEFAULT   14 etext
    53: 0000000000203a68     0 NOTYPE  LOCAL  DEFAULT   18 edata
    54: 0000000000200000     0 NOTYPE  LOCAL  DEFAULT    1 __dso_handle
    55: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    1 _TLS_MODULE_BASE_
    56: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT [<other>: 20]   UND fprintf
    57: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT [<other>: 34]   UND strtol
    58: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND stdin
    59: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND stderr
    60: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fread
    61: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND feof

(the [<other>: xx] part)

A repro suite is attached.
repro.tar.zip

@rui314
Copy link
Owner Author

rui314 commented Oct 19, 2022

Instead of adding a command line option to mold to suppress the linker-synthesized symbols, I'd fix BOLT. BOLT should not be affected by these symbols.

@rui314
Copy link
Owner Author

rui314 commented Oct 19, 2022

I can't reproduce the symbol visibility issue with git head, so maybe it's already fixed?

@ishitatsuyuki
Copy link
Contributor

Instead of adding a command line option to mold to suppress the linker-synthesized symbols, I'd fix BOLT. BOLT should not be affected by these symbols.

Ack. Will file a bug report later.

I can't reproduce the symbol visibility issue with git head, so maybe it's already fixed?

Bisect says it's gone after d2b7b04. But that commit doesn't really touch visibility, and it looks like the exact corrupted symbols differs between builds, so I wonder if there's some nondeterminism involved here.


After switching to v1.5.1 which doesn't emit the $plt symbols, BOLT is now complaining values calculated from relocations doesn't match the ones inside the executable:

llvm-bolt: /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1972: bool llvm::bolt::RewriteInstance::analyzeRelocation(const llvm::object::RelocationRef &, uint64_t, std::string &, bool &, uint64_t &, int64_t &, uint64_t &, bool &) const: Assertion `verifyExtractedValue() && "mismatched extracted relocation value"' failed.

readelf also complains about invalid symbol indices, so I'm suspecting --emit-relocs isn't working quite as expected. I'll report when I have a reduced reproduction sample.

readelf: Error:  bad symbol index: 0002f885 in reloc
readelf: Error:  bad symbol index: 0002f886 in reloc
readelf: Error:  bad symbol index: 0002f887 in reloc
readelf: Error:  bad symbol index: 0002f888 in reloc
readelf: Error:  bad symbol index: 0002f889 in reloc
readelf: Error:  bad symbol index: 0002f888 in reloc
readelf: Error:  bad symbol index: 0002f88a in reloc
readelf: Error:  bad symbol index: 0002f888 in reloc
readelf: Error:  bad symbol index: 0002f88b in reloc
...

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 20, 2022

1124322 helps: BOLT now progress through a few symbols before hitting assert, and readelf errors are down to 5 entries. However something still seems to be wrong.

Edit: I probably know why:

for (i64 i = 1, j = 0; i < file->first_global; i++)
seems to need an update too.

@ishitatsuyuki
Copy link
Contributor

With #798 and a hack to disable $plt-like symbols (in https://github.com/ishitatsuyuki/mold/tree/bolt) BOLT is working flawlessly 🎉 🎉 🎉

I used Rust's PGO build pipeline, which has full ThinLTO+PGO+BOLT for its LLVM component. I had to do a few quick hacks, namely changing the base distribution to get some decent support of C++20. The diff is available at https://github.com/ishitatsuyuki/rust/tree/moldBOLT.

The resulting binary should also be running fine since the pipeline uses the just-built compiler to compile the ecosystem tools (Cargo, rust-analyzer etc.), but I'll just verify with a icache heatmap on my Intel laptop later, just in case.

@rui314
Copy link
Owner Author

rui314 commented Oct 20, 2022

Great work! It's impressive that you identified and resolved the issue so quickly. Thank you!

@ptr1337
Copy link

ptr1337 commented Oct 20, 2022

@rui314 @ishitatsuyuki

Great work! I just tested to instrument clang, and it has worked. Gathering now the profile, then the optimization process will be done. So far it seems to be working well.

BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x4e00000, offset 0x4e00000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling -align-macro-fusion=all since no profile was specified
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section
BOLT-WARNING: Failed to analyze 470109 relocations
BOLT-WARNING: 6 collisions detected while hashing binary objects. Use -v=1 to see the list.
BOLT-INFO: 0 out of 97833 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: the input contains 17357 (dynamic count : 0) opportunities for macro-fusion optimization that are going to be fixed
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 28238
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 94233
BOLT-INSTRUMENTER: Number of function descriptors: 94233
BOLT-INSTRUMENTER: Number of branch counters: 904843
BOLT-INSTRUMENTER: Number of ST leaf node counters: 443070
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 1347913
BOLT-INSTRUMENTER: Total size of counters: 10783304 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 11389554 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 96355088 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /home/ptr1337/toolchain/llvm/stage3-without-sampling/intrumentdata/clang-15.fdata
BOLT-INFO: 1134971 instructions were shortened
BOLT-INFO: removed 109 empty blocks
BOLT-INFO: merged 1 duplicate CFG edge
BOLT-INFO: UCE removed 838 blocks and 51318 bytes of code.
BOLT-INFO: SCTC: patched 0 tail calls (0 forward) tail calls (0 backward) from a total of 0 while removing 0 double jumps and removing 0 basic blocks totalling 0 bytes of code. CTCs total execution count is 0 and the number of times CTCs are taken is 0.
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0xc3944c0
BOLT-INFO: clear procedure is 0xc38f360
BOLT-INFO: setting _end to 0xc514c9c

The only thing, which seems different then with lld to me is:

BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section

Same as this, seems to me a bit high:

BOLT-WARNING: Failed to analyze 470109 relocations

Thanks!

edit:

Instrumentation, merging fdata and the optimization was successful!
Here the log:

BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 39f01240e76678fa6385d36e7a96670677a26d65
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-WARNING: _etext/1 (0x490955c) does not have any section
BOLT-WARNING: etext/1 (0x490955c) does not have any section
BOLT-WARNING: Failed to analyze 470109 relocations
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-WARNING: 1 collisions detected while hashing binary objects. Use -v=1 to see the list.
BOLT-INFO: 10210 out of 97833 functions in the binary (10.4%) have non-empty execution profile
BOLT-INFO: 649 functions with profile could not be optimized
BOLT-WARNING: 5 (0.0% of all profiled) functions have invalid (possibly stale) profile. Use -report-stale to see the list.
BOLT-WARNING: 460262 out of 243184719344 samples in the binary (0.0%) belong to functions with invalid (possibly stale) profile.
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: the input contains 4674 (dynamic count : 3071330706) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 354969 instructions were shortened
BOLT-INFO: removed 851 empty blocks
BOLT-INFO: ICF folded 1478 out of 98109 functions in 4 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 166.21 KB of code space. Folded functions were called 3122835745 times based on profile.
BOLT-INFO: 31975 PLT calls in the binary were optimized.
BOLT-INFO: basic block reordering modified layout of 4922 functions (48.21% of profiled, 5.09% of total)
BOLT-INFO: UCE removed 1 blocks and 6 bytes of code.
BOLT-INFO: splitting separates 4610217 hot bytes from 3731975 cold bytes (55.26% of split functions is hot).
BOLT-INFO: 129 Functions were reordered by LoopInversionPass
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:

        138489556757 : executed forward branches
         20942286787 : taken forward branches
         46989998144 : executed backward branches
         28605377103 : taken backward branches
          9641011040 : executed unconditional branches
         14576806007 : all function calls
          7293536017 : indirect calls
          3769079744 : PLT calls
       1222503556165 : executed instructions
        298324511206 : executed load instructions
        120845030651 : executed store instructions
                   0 : taken jump table branches
                   0 : taken unknown indirect branches
        195120565941 : total branches
         59188674930 : taken branches
        135931891011 : non-taken conditional branches
         49547663890 : taken conditional branches
        185479554901 : all conditional branches

        124400786448 : executed forward branches (-10.2%)
          8624897609 : taken forward branches (-58.8%)
         61078768453 : executed backward branches (+30.0%)
         28657226155 : taken backward branches (+0.2%)
          6073827710 : executed unconditional branches (-37.0%)
         10807726263 : all function calls (-25.9%)
          7293536017 : indirect calls (=)
                   0 : PLT calls (-100.0%)
       1209826144728 : executed instructions (-1.0%)
        298324511206 : executed load instructions (=)
        120845030651 : executed store instructions (=)
                   0 : taken jump table branches (=)
                   0 : taken unknown indirect branches (=)
        191553382611 : total branches (-1.8%)
         43355951474 : taken branches (-26.7%)
        148197431137 : non-taken conditional branches (+9.0%)
         37282123764 : taken conditional branches (-24.8%)
        185479554901 : all conditional branches (=)

BOLT-INFO: SCTC: patched 39 tail calls (37 forward) tail calls (2 backward) from a total of 39 while removing 4 double jumps and removing 31 basic blocks totalling 155 bytes of code. CTCs total execution count is 2168704 and the number of times CTCs are taken is 1081892.
BOLT-INFO: setting _end to 0x619ea10
BOLT-INFO: setting __hot_start to 0x4e00000
BOLT-INFO: setting __hot_end to 0x57411d6

@rui314
Copy link
Owner Author

rui314 commented Oct 21, 2022

@ishitatsuyuki You may want to test BOLT with a program that uses exceptions, as clang and (I believe) rustc don't use exceptions. That part might not be tested yet.

@maksfb
Copy link

maksfb commented Oct 21, 2022

Our .eh_frame section does not have a relocation section even if --emit-relocs is given. We synthesize .eh_frame, and we probably need to synthesize .rela.eh_frame too

BOLT doesn't need .rela.eh_frame as it creates internal relocations while parsing the section, i.e. kinda synthesizes .rela.eh_frame internally.

@rui314
Copy link
Owner Author

rui314 commented Oct 21, 2022

BOLT doesn't need .rela.eh_frame as it creates internal relocations while parsing the section, i.e. kinda synthesizes .rela.eh_frame internally.

Oh cool, then we don't need to synthsize .rela.eh_frame. Exceptions could fail for other reasons, so it's nice to test it though.

@ishitatsuyuki
Copy link
Contributor

The BOLT-WARNING: Failed to analyze 470109 relocations is because BOLT don't know how to read the GOT32 relocations; it's probably relaxed by bfd but not mold. It seems harmless so far but I haven't checked in detail.

@rui314
Copy link
Owner Author

rui314 commented Oct 21, 2022

@ishitatsuyuki I'd file that to BOLT even if it's harmless.

@ishitatsuyuki
Copy link
Contributor

@ishitatsuyuki You may want to test BOLT with a program that uses exceptions, as clang and (I believe) rustc don't use exceptions. That part might not be tested yet.

The real-life example of BOLT seems really scarce, so I don't have a good idea what to test. I'll run BOLT's own test suite in LLVM with mold which has some CFI tests.

@ishitatsuyuki I'd file that to BOLT even if it's harmless.

Ack.

@rui314
Copy link
Owner Author

rui314 commented Oct 21, 2022

Maybe @maksfb has an idea as to what programs are suitable for testing BOLT?

@maksfb
Copy link

maksfb commented Oct 21, 2022

BOLT test suite has several runnable x86 tests for C++ exceptions. E.g. https://github.com/llvm/llvm-project/blob/main/bolt/test/runtime/X86/exceptions-run.test

For large open-source project with C++ exceptions, I can recommend trying HHVM (hhvm.com). I've also heard that ScyllaDB uses BOLT, but I have no first-hand experience with it.

@ishitatsuyuki
Copy link
Contributor

I ran the BOLT test suite with mold (-DBOLT_LLD_EXE=/path/to/mold):

Failed Tests (10):

  • BOLT :: X86/data-to-data-pcrel.s
    BOLT expect R_X86_64_GOT32 to be relaxed to R_X86_64_PC64 but mold does not do it yet.
  • BOLT :: X86/double-rel.s
    Caused by mold turning all symbols in a -nostdlib (and static) executable to LOCAL.
  • BOLT :: X86/dummy-eh-frame-bug.s
    Not sure.
  • BOLT :: X86/dynrelocs.s
    BOLT assert; cause unknown. Logs below.
  • BOLT :: X86/fptr-addend-pcrel.s
    Caused by mold turning all symbols in a -nostdlib (and static) executable to LOCAL.
  • BOLT :: X86/gdbindex.test
    Hardcoded debug info offsets, I think this is actually fine.
  • BOLT :: X86/gotpcrelx.s
    Caused by mold turning all symbols in a -nostdlib (and static) executable to LOCAL.
  • BOLT :: X86/shrinkwrapping-lea.s
    Caused by GOT relocation stuff?
  • BOLT :: X86/shrinkwrapping-mov.s
    Caused by GOT relocation stuff?
  • BOLT :: X86/symtab-secondary-entries.test
    Caused by mold turning all symbols in a -nostdlib (and static) executable to LOCAL.

Logs for X86/dynrelocs.s:

BOLT-WARNING: _etext/1 (0x20139d) does not have any section
BOLT-WARNING: etext/1 (0x20139d) does not have any section
llvm-bolt: /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2595: void llvm::bolt::RewriteInstance::handleRelocation(const llvm::object::SectionRef &, const llvm::object::RelocationRef &): Assertion `(IsAArch64 || IsSectionRelocation || BD->nameStartsWith(SymbolName) || BD->nameStartsWith("PG" + SymbolName) || (BD->nameStartsWith("ANONYMOUS") && (BD->getSectionName().startswith(".plt") || BD->getSectionName().endswith(".plt")))) && "BOLT symbol names of all non-section relocations must match up " "with symbol names referenced in the relocation"' failed.
 #0 0x000056123b12837a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
 #1 0x000056123b12852b PrintStackTraceSignalHandler(void*) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
 #2 0x000056123b126b76 llvm::sys::RunSignalHandlers() /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Signals.cpp:104:5
 #3 0x000056123b128c55 SignalHandler(int) /home/ishitatsuyuki/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007ff3d0334520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007ff3d0388a7c __pthread_kill_implementation ./nptl/./nptl/pthread_kill.c:44:76
 #6 0x00007ff3d0388a7c __pthread_kill_internal ./nptl/./nptl/pthread_kill.c:78:10
 #7 0x00007ff3d0388a7c pthread_kill ./nptl/./nptl/pthread_kill.c:89:10
 #8 0x00007ff3d0334476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #9 0x00007ff3d031a7f3 abort ./stdlib/./stdlib/abort.c:81:7
#10 0x00007ff3d031a71b _nl_load_domain ./intl/./intl/loadmsgcat.c:1177:9
#11 0x00007ff3d032be96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#12 0x000056123b1f03b4 llvm::bolt::RewriteInstance::handleRelocation(llvm::object::SectionRef const&, llvm::object::RelocationRef const&) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2588:7
#13 0x000056123b1ed645 llvm::bolt::RewriteInstance::readRelocations(llvm::object::SectionRef const&) /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2329:33
#14 0x000056123b1ea978 llvm::bolt::RewriteInstance::processRelocations() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:2007:3
#15 0x000056123b1e507f llvm::bolt::RewriteInstance::discoverFileObjects() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:1240:1
#16 0x000056123b1e0ec2 llvm::bolt::RewriteInstance::run() /home/ishitatsuyuki/llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp:720:3
#17 0x000056123a7340fb main /home/ishitatsuyuki/llvm-project/bolt/tools/driver/llvm-bolt.cpp:244:17
#18 0x00007ff3d031bd90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#19 0x00007ff3d031be40 call_init ./csu/../csu/libc-start.c:128:20
#20 0x00007ff3d031be40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#21 0x000056123a733065 _start (/home/ishitatsuyuki/llvm-project/build/bin/llvm-bolt+0x260065)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/ishitatsuyuki/llvm-project/build/bin/llvm-bolt -data /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.fdata /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.exe -relocs -o /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.out -lite=0 -jump-tables=move
/home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.script: line 8: 749269 Aborted                 (core dumped) /home/ishitatsuyuki/llvm-project/build/bin/llvm-bolt -data /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.fdata /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.exe -relocs -o /home/ishitatsuyuki/llvm-project/build/tools/bolt/test/X86/Output/dynrelocs.s.tmp.out -lite=0 -jump-tables=move

Testing Time: 22.61s
  Unsupported:  26
  Passed     : 194
  Failed     :  10

@ishitatsuyuki
Copy link
Contributor

@rui314 Any idea where a GOT32 reloc can come from? You can see it in e.g. the emit-relocs test.

I readelf'd all the input .o and .a files but they didn't seem to contain any. A quick grep of mold code don't show any uses that assigns the value of GOT32, so I don't have a good clue here.

In bfd, the symbols seems to get assigned a PC64 reloc instead, although I'm not 100% sure it's pointing to the same thing.

Finally there's a chance that GOT32 is just a bogus value we assign due to some bug. I see a bunch of these for .debug_lines too.

@rui314
Copy link
Owner Author

rui314 commented Oct 26, 2022

It looks like mold creates GOT32 relocations for section symbols. We discard section symbols in input files and re-create new section symbols for output sections, as we want to keep only one section symbol for each section.

I don't know why it lead to creating GOT32 relocations, though. But the relocation type looks bogus.

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 26, 2022

OK, this is the culprit then 😅

buf[j].r_type = STT_SECTION;

(STT_SECTION = R_X86_64_GOT32)

@rui314
Copy link
Owner Author

rui314 commented Oct 26, 2022

LOL

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 26, 2022

With #833 BOLT :: X86/data-to-data-pcrel.s is fixed. The remaining 9 still fails and there still seems to be "relocation parse failed" messages (Update: gone after rebuild). I'll check what BOLT is complaning about this time.

@ishitatsuyuki
Copy link
Contributor

There are still 9 failures but I think we're mostly done for it now. Reasons it's failing seems to be due to one of: 1. mold defaulting to LOCAL for executables (need --export-dynamic), 2. mold generating more symbols that have the same address as others (probably harmless, just a different name), and 3. straight out hardcoded expectations. I'll test HHVM next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants