Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store expansion locations separately from tokens #623

Merged
merged 7 commits into from
Feb 9, 2024

Conversation

ehaas
Copy link
Collaborator

@ehaas ehaas commented Feb 9, 2024

The main insight here is that most tokens don't have expansion locations; and that expansion locations are (currently) only used for diagnostic messages, so "slow" lookups are not a problem. Since tokens are always inserted in-order by the preprocessor (sorted by construction), we can store token index + expansion locations in a MultiArrayList and binary search for a token index when we want the corresponding expansion locations.

sqlite benchmark:

Benchmark 1 (47 runs): source/arocc/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           213ms ±  575us     211ms …  214ms          0 ( 0%)        0%
  peak_rss           62.5MB ± 58.0KB    62.4MB … 62.6MB          0 ( 0%)        0%
  cpu_cycles          789M  ± 2.09M      784M  …  793M           0 ( 0%)        0%
  instructions       1.71G  ± 2.24      1.71G  … 1.71G           1 ( 2%)        0%
  cache_references   2.06M  ± 24.4K     2.01M  … 2.11M           0 ( 0%)        0%
  cache_misses        545K  ± 11.0K      514K  …  571K           4 ( 9%)        0%
  branch_misses      7.34M  ± 10.2K     7.32M  … 7.36M           0 ( 0%)        0%
Benchmark 2 (48 runs): source/expansion-tokens/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           212ms ±  838us     211ms …  216ms          2 ( 4%)          -  0.3% ±  0.1%
  peak_rss           58.4MB ± 54.2KB    58.4MB … 58.5MB          0 ( 0%)        ⚡-  6.5% ±  0.0%
  cpu_cycles          790M  ± 2.88M      785M  …  805M           4 ( 8%)          +  0.2% ±  0.1%
  instructions       1.71G  ± 2.18      1.71G  … 1.71G           0 ( 0%)          +  0.2% ±  0.0%
  cache_references   2.02M  ± 25.4K     1.97M  … 2.07M           0 ( 0%)        ⚡-  2.1% ±  0.5%
  cache_misses        557K  ± 10.9K      536K  …  590K           2 ( 4%)        💩+  2.1% ±  0.8%
  branch_misses      7.14M  ± 9.93K     7.12M  … 7.17M           1 ( 2%)        ⚡-  2.8% ±  0.1%

I wasn't able to use poop with zig2.c because of some <quadmath.h> shenanigans (which I don't fully understand, but that's a separate issue). However using /usr/bin/time I was able to measure the max RSS

Master branch: Maximum resident set size (kbytes): 1377964
This PR: Maximum resident set size (kbytes): 1195260
(Difference of 187,088,896 bytes)

Copy link
Owner

@Vexu Vexu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Another quick memory optimization would be to add a clearBuffers function that could be called at the end of preprocessSources.

Merge at will.

@ehaas
Copy link
Collaborator Author

ehaas commented Feb 9, 2024

Updated benchmark with clearBuffers

Benchmark 1 (48 runs): source/arocc/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           212ms ±  729us     211ms …  215ms          2 ( 4%)        0%
  peak_rss           62.5MB ± 53.6KB    62.4MB … 62.6MB          0 ( 0%)        0%
  cpu_cycles          788M  ± 1.73M      785M  …  793M           1 ( 2%)        0%
  instructions       1.71G  ± 1.87      1.71G  … 1.71G           0 ( 0%)        0%
  cache_references   2.05M  ± 23.9K     2.02M  … 2.10M           0 ( 0%)        0%
  cache_misses        545K  ± 8.88K      524K  …  575K           3 ( 6%)        0%
  branch_misses      7.34M  ± 14.8K     7.32M  … 7.40M           3 ( 6%)        0%
Benchmark 2 (47 runs): source/expansion-tokens/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           213ms ±  642us     212ms …  215ms          1 ( 2%)          +  0.2% ±  0.1%
  peak_rss           58.2MB ± 49.3KB    58.2MB … 58.3MB          0 ( 0%)        ⚡-  6.8% ±  0.0%
  cpu_cycles          795M  ± 2.07M      792M  …  804M           2 ( 4%)          +  0.8% ±  0.1%
  instructions       1.71G  ± 1.79      1.71G  … 1.71G           1 ( 2%)          +  0.2% ±  0.0%
  cache_references   2.02M  ± 21.8K     1.97M  … 2.06M           0 ( 0%)        ⚡-  1.5% ±  0.5%
  cache_misses        547K  ± 9.08K      529K  …  575K           2 ( 4%)          +  0.4% ±  0.7%
  branch_misses      7.25M  ± 9.97K     7.23M  … 7.29M           1 ( 2%)        ⚡-  1.2% ±  0.1%

zig2.c: Maximum resident set size (kbytes): 1093240 (previously 1195260) so saves another 100 megs, not bad.

@ehaas ehaas merged commit c72f45c into Vexu:master Feb 9, 2024
3 checks passed
@ehaas ehaas deleted the expansion-tokens branch February 9, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants