You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding tokenizers-cpp to our project made the binary size go up from a (stripped) baseline of 1.8MB to 8.4MB in a release build on Linux x86_64. This was just with the stable rust toolchain and all defaults.
I'm no rust expert but I applied a number of the options (that I could get to work) from this page https://github.com/johnthagen/min-sized-rust and was able to trivially get the binary size down to 4.2MB. Given that rust static links by default and there is a lot of data manipulation standard library code, this doesn't completely shock me (but it is still quite large compared to our baseline).
Recording here the things I quickly tried to achieve that:
set(TOKENIZERS_CPP_RUST_FLAGS "-Zlocation-detail=none") in CMakeLists.txt (feature request: consider making these more configurable from the including project)
Use the build-std approach listed above by adding this to the cargo command line in CMakeLists: -Z build-std=std,panic_abort -Z build-std-features="optimize_for_size" (and using a nightly toolchain)
Adding tokenizers-cpp to our project made the binary size go up from a (stripped) baseline of 1.8MB to 8.4MB in a release build on Linux x86_64. This was just with the stable rust toolchain and all defaults.
I'm no rust expert but I applied a number of the options (that I could get to work) from this page https://github.com/johnthagen/min-sized-rust and was able to trivially get the binary size down to 4.2MB. Given that rust static links by default and there is a lot of data manipulation standard library code, this doesn't completely shock me (but it is still quite large compared to our baseline).
Recording here the things I quickly tried to achieve that:
set(TOKENIZERS_CPP_RUST_FLAGS "-Zlocation-detail=none")
in CMakeLists.txt (feature request: consider making these more configurable from the including project)build-std
approach listed above by adding this to the cargo command line in CMakeLists:-Z build-std=std,panic_abort -Z build-std-features="optimize_for_size"
(and using a nightly toolchain)I wasn't being super principaled, but iirc 1 and 2 combined shaved off ~500KB or so. LTO and build-std gave 2-3MB and the rest filled in the extra.
It would be good to be able to customize these things easily in the including project. Might call for some additional CMake goo and such.
The text was updated successfully, but these errors were encountered: