Store expansion locations separately from tokens #623

ehaas · 2024-02-09T07:45:25Z

The main insight here is that most tokens don't have expansion locations; and that expansion locations are (currently) only used for diagnostic messages, so "slow" lookups are not a problem. Since tokens are always inserted in-order by the preprocessor (sorted by construction), we can store token index + expansion locations in a MultiArrayList and binary search for a token index when we want the corresponding expansion locations.

sqlite benchmark:

Benchmark 1 (47 runs): source/arocc/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           213ms ±  575us     211ms …  214ms          0 ( 0%)        0%
  peak_rss           62.5MB ± 58.0KB    62.4MB … 62.6MB          0 ( 0%)        0%
  cpu_cycles          789M  ± 2.09M      784M  …  793M           0 ( 0%)        0%
  instructions       1.71G  ± 2.24      1.71G  … 1.71G           1 ( 2%)        0%
  cache_references   2.06M  ± 24.4K     2.01M  … 2.11M           0 ( 0%)        0%
  cache_misses        545K  ± 11.0K      514K  …  571K           4 ( 9%)        0%
  branch_misses      7.34M  ± 10.2K     7.32M  … 7.36M           0 ( 0%)        0%
Benchmark 2 (48 runs): source/expansion-tokens/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           212ms ±  838us     211ms …  216ms          2 ( 4%)          -  0.3% ±  0.1%
  peak_rss           58.4MB ± 54.2KB    58.4MB … 58.5MB          0 ( 0%)        ⚡-  6.5% ±  0.0%
  cpu_cycles          790M  ± 2.88M      785M  …  805M           4 ( 8%)          +  0.2% ±  0.1%
  instructions       1.71G  ± 2.18      1.71G  … 1.71G           0 ( 0%)          +  0.2% ±  0.0%
  cache_references   2.02M  ± 25.4K     1.97M  … 2.07M           0 ( 0%)        ⚡-  2.1% ±  0.5%
  cache_misses        557K  ± 10.9K      536K  …  590K           2 ( 4%)        💩+  2.1% ±  0.8%
  branch_misses      7.14M  ± 9.93K     7.12M  … 7.17M           1 ( 2%)        ⚡-  2.8% ±  0.1%

I wasn't able to use poop with zig2.c because of some <quadmath.h> shenanigans (which I don't fully understand, but that's a separate issue). However using /usr/bin/time I was able to measure the max RSS

Master branch: Maximum resident set size (kbytes): 1377964
This PR: Maximum resident set size (kbytes): 1195260
(Difference of 187,088,896 bytes)

…kens

Vexu

Looks good! Another quick memory optimization would be to add a clearBuffers function that could be called at the end of preprocessSources.

Merge at will.

ehaas · 2024-02-09T17:23:36Z

Updated benchmark with clearBuffers

Benchmark 1 (48 runs): source/arocc/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           212ms ±  729us     211ms …  215ms          2 ( 4%)        0%
  peak_rss           62.5MB ± 53.6KB    62.4MB … 62.6MB          0 ( 0%)        0%
  cpu_cycles          788M  ± 1.73M      785M  …  793M           1 ( 2%)        0%
  instructions       1.71G  ± 1.87      1.71G  … 1.71G           0 ( 0%)        0%
  cache_references   2.05M  ± 23.9K     2.02M  … 2.10M           0 ( 0%)        0%
  cache_misses        545K  ± 8.88K      524K  …  575K           3 ( 6%)        0%
  branch_misses      7.34M  ± 14.8K     7.32M  … 7.40M           3 ( 6%)        0%
Benchmark 2 (47 runs): source/expansion-tokens/zig-out/bin/arocc -Wno-unreachable-code -fsyntax-only -I. sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           213ms ±  642us     212ms …  215ms          1 ( 2%)          +  0.2% ±  0.1%
  peak_rss           58.2MB ± 49.3KB    58.2MB … 58.3MB          0 ( 0%)        ⚡-  6.8% ±  0.0%
  cpu_cycles          795M  ± 2.07M      792M  …  804M           2 ( 4%)          +  0.8% ±  0.1%
  instructions       1.71G  ± 1.79      1.71G  … 1.71G           1 ( 2%)          +  0.2% ±  0.0%
  cache_references   2.02M  ± 21.8K     1.97M  … 2.06M           0 ( 0%)        ⚡-  1.5% ±  0.5%
  cache_misses        547K  ± 9.08K      529K  …  575K           2 ( 4%)          +  0.4% ±  0.7%
  branch_misses      7.25M  ± 9.97K     7.23M  … 7.29M           1 ( 2%)        ⚡-  1.2% ±  0.1%

zig2.c: Maximum resident set size (kbytes): 1093240 (previously 1195260) so saves another 100 megs, not bad.

ehaas added 6 commits February 7, 2024 21:43

Tree: store expansion locations separately from tokens

6831422

Preprocessor: store token and expansion len together when deleting to…

992da7d

…kens

Preprocessor: Add expansion locs before tokens

0a279bf

Preprocessor: rename expansion_locs to expansion_entries

be63453

Preprocessor: use TokenIndex instead ofu32 for ExpansionEntry index

ff6fd83

tests: use addToken instead of directly appending to preprocessor tokens

58c2ba4

Vexu approved these changes Feb 9, 2024

View reviewed changes

Preprocessor: clear temporary buffers after preprocessing

f21caa8

ehaas merged commit c72f45c into Vexu:master Feb 9, 2024
3 checks passed

ehaas deleted the expansion-tokens branch February 9, 2024 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store expansion locations separately from tokens #623

Store expansion locations separately from tokens #623

ehaas commented Feb 9, 2024

Vexu left a comment

ehaas commented Feb 9, 2024

Store expansion locations separately from tokens #623

Store expansion locations separately from tokens #623

Conversation

ehaas commented Feb 9, 2024

Vexu left a comment

Choose a reason for hiding this comment

ehaas commented Feb 9, 2024