Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SymbolStack: Use hash map instead of array for scopes #573

Merged
merged 5 commits into from
Nov 24, 2023

Conversation

ehaas
Copy link
Collaborator

@ehaas ehaas commented Nov 21, 2023

Benchmark 1 (16 runs): /home/ehaas/source/arocc/zig-out/bin/arocc -I. -fsyntax-only sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           332ms ± 1.00ms     331ms …  335ms          1 ( 6%)        0%
  peak_rss           62.1MB ± 44.2KB    62.1MB … 62.2MB          0 ( 0%)        0%
  cpu_cycles         1.27G  ± 2.18M     1.27G  … 1.27G           0 ( 0%)        0%
  instructions       3.35G  ± 1.91      3.35G  … 3.35G           1 ( 6%)        0%
  cache_references   2.05M  ± 23.7K     2.02M  … 2.08M           0 ( 0%)        0%
  cache_misses        562K  ± 11.1K      541K  …  596K           2 (13%)        0%
  branch_misses      11.4M  ± 19.8K     11.4M  … 11.4M           2 (13%)        0%
Benchmark 2 (24 runs): /home/ehaas/source/hashmap-symbols/zig-out/bin/arocc -I. -fsyntax-only sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           210ms ±  559us     209ms …  211ms          1 ( 4%)        ⚡- 36.7% ±  0.2%
  peak_rss           62.3MB ± 57.2KB    62.3MB … 62.4MB          0 ( 0%)        +  0.3% ±  0.1%
  cpu_cycles          775M  ± 2.36M      771M  …  780M           2 ( 8%)        ⚡- 39.0% ±  0.1%
  instructions       1.66G  ± 5.47      1.66G  … 1.66G           2 ( 8%)        ⚡- 50.3% ±  0.0%
  cache_references   2.03M  ± 29.4K     2.00M  … 2.08M           0 ( 0%)        -  0.8% ±  0.9%
  cache_misses        545K  ± 7.67K      530K  …  561K           1 ( 4%)        ⚡-  3.1% ±  1.1%
  branch_misses      7.62M  ± 8.28K     7.60M  … 7.63M           0 ( 0%)        ⚡- 33.1% ±  0.1%

On a preprocessed zig2.c (with 1cdf4d5 reverted to prevent a parse error) I get the following:

GCC 13.2
time gcc -fsyntax-only -Wfatal-errors -undef -std=c99 preprocessed.c

real	1m4.331s
user	1m3.934s
sys	0m0.390s

With this branch I get

time arocc -fsyntax-only -Wfatal-errors -undef -std=c99 preprocessed.c

real	0m6.572s
user	0m6.043s
sys	0m0.521s

We have a lot of spurious "non-void function does not return a value" warnings and some incorrect integer overflows but I don't think fixing those will make arocc 10x slower :)

As a future enhancement we could look at reusing scope hash maps instead of allocating and freeing them all the time.

Closes #321

@ehaas
Copy link
Collaborator Author

ehaas commented Nov 21, 2023

Did this just to test it out, I might want to clean up the API a bit but the results look promising

@Vexu
Copy link
Owner

Vexu commented Nov 21, 2023

As a future enhancement we could look at reusing scope hash maps instead of allocating and freeing them all the time.

With ArrayHashMap.shrinkRetainingCapacity you should be able to use the same scoping system as was used previously.

@ehaas
Copy link
Collaborator Author

ehaas commented Nov 21, 2023

Is there an advantage to using ArrayHashMap + shrinkRetainingCapacity vs HashMap + clearRetainingCapacity? I can try both and measure it, unless there's some obvious reason why ArrayHashMap will be better.

@Vexu
Copy link
Owner

Vexu commented Nov 21, 2023

ArrayHashMap preserves insertion order.

@ehaas
Copy link
Collaborator Author

ehaas commented Nov 21, 2023

Unless I'm misunderstanding your suggestion, does that matter? We don't need to iterate over the keys/values, we only ever do lookups. I thought you were suggesting that in popScope we should just call shrinkRetainingCapacity (or clearRetainingCapacity) instead of deinit on the popped scope. And then deinit them all when we're finished. I think we'd just need to track the maximum number of scopes that were added, so we know how many to deinit at the end.

@Vexu
Copy link
Owner

Vexu commented Nov 21, 2023

I meant that the current syms MultiArrayList would be replaced by the ArrayHashMaps and scopes would remain indexes into them.

@ehaas
Copy link
Collaborator Author

ehaas commented Nov 22, 2023

Thinking about this some more - given that C allows variable shadowing, how would this work? e.g. if the code being parsed is:

int x = 5;
void foo(void) {
    float x = 1.0f;
}

We encounter the first x and insert it into the ArrayHashMap let's say at index 10. foo gets inserted at index 11, then we push a new scope and say that it starts at index 12. We encounter the second x and look that up, see that it's at index 10, so it's not in the current scope and we need to insert x again (at index 12). But since it's the same name as before we can't do that, right?

@Vexu
Copy link
Owner

Vexu commented Nov 22, 2023

Hmm, I think it would probably be possible with a custom context that accounts for the scope index but that can be a future improvement.

@ehaas
Copy link
Collaborator Author

ehaas commented Nov 22, 2023

Sounds good. When I get a chance I'm going to update it to use a multiarray of hashmaps and use clearRetainingCapacity when popping since that seems like an easy win.

@ehaas ehaas force-pushed the hashmap-symbols branch 3 times, most recently from 8651068 to 047890c Compare November 23, 2023 12:58
@ehaas ehaas marked this pull request as ready for review November 23, 2023 13:05
@ehaas
Copy link
Collaborator Author

ehaas commented Nov 23, 2023

Using clearRetainingCapacity made no difference in my benchmarks (sqlite3.c and zig2.c) in ReleaseFast, which was surprising to me.

Copy link
Owner

@Vexu Vexu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using clearRetainingCapacity made no difference in my benchmarks (sqlite3.c and zig2.c) in ReleaseFast, which was surprising to me.

Huh, I would not have expected that, I guess it would be because clearRetainingCapacity memsets the entire capacity of the hash map.

Use whichever version you prefer more but the current one needs some changes.


pub fn deinit(s: *SymbolStack, gpa: Allocator) void {
s.syms.deinit(gpa);
std.debug.assert(s.innermost == std.math.maxInt(@TypeOf(s.innermost)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you instead change get to use innermost - 1 and remove the wrapping from pop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been coding on a plane with 2s latency so it's possible I messed up the benchmark; I'll repeat it when I get home.

Updated SymbolStack to remove the wrapping and renamed the field to active_len since I think that more accurately describes its role after the change.

Previously we weren't checking if a name already existed as a different
kind of symbol when defining typdefs, and weren't checking if typedefs
existed when defining other kinds of symbols which use var scope.
@ehaas
Copy link
Collaborator Author

ehaas commented Nov 24, 2023

Ok I ran the benchmarks again:

First is the baseline original hashmap implementation.
Second is with retaining the allocations
Third is with the .both update

Top set is zig2.c, bottom is sqlite3.c

Benchmark 1 (3 runs): /home/ehaas/source/arocc/zig-out/bin/arocc-baseline -fsyntax-only -Wfatal-errors -undef -std=c99 processed.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          6.56s  ± 4.31ms    6.55s  … 6.56s           0 ( 0%)        0%
  peak_rss           1.23GB ± 53.3KB    1.23GB … 1.23GB          0 ( 0%)        0%
  cpu_cycles         25.0G  ± 8.71M     25.0G  … 25.0G           0 ( 0%)        0%
  instructions       53.1G  ± 19.5      53.1G  … 53.1G           0 ( 0%)        0%
  cache_references   59.8M  ±  763K     59.3M  … 60.6M           0 ( 0%)        0%
  cache_misses       26.7M  ±  149K     26.6M  … 26.9M           0 ( 0%)        0%
  branch_misses       249M  ±  153K      249M  …  250M           0 ( 0%)        0%
Benchmark 2 (3 runs): /home/ehaas/source/arocc/zig-out/bin/arocc-clear-retaining -fsyntax-only -Wfatal-errors -undef -std=c99 processed.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          6.42s  ± 6.88ms    6.41s  … 6.43s           0 ( 0%)        ⚡-  2.1% ±  0.2%
  peak_rss           1.23GB ± 12.5KB    1.23GB … 1.23GB          0 ( 0%)          -  0.0% ±  0.0%
  cpu_cycles         24.4G  ± 30.0M     24.4G  … 24.4G           0 ( 0%)        ⚡-  2.2% ±  0.2%
  instructions       52.8G  ± 21.3      52.8G  … 52.8G           0 ( 0%)          -  0.4% ±  0.0%
  cache_references   59.7M  ±  811K     59.2M  … 60.6M           0 ( 0%)          -  0.1% ±  3.0%
  cache_misses       26.5M  ±  106K     26.4M  … 26.6M           0 ( 0%)          -  1.0% ±  1.1%
  branch_misses       246M  ± 64.3K      246M  …  246M           0 ( 0%)        ⚡-  1.5% ±  0.1%
Benchmark 3 (3 runs): /home/ehaas/source/arocc/zig-out/bin/arocc-small-opt -fsyntax-only -Wfatal-errors -undef -std=c99 processed.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          6.50s  ± 11.4ms    6.49s  … 6.51s           0 ( 0%)          -  0.9% ±  0.3%
  peak_rss           1.23GB ± 36.9KB    1.23GB … 1.23GB          0 ( 0%)          -  0.0% ±  0.0%
  cpu_cycles         24.7G  ± 40.7M     24.7G  … 24.8G           0 ( 0%)          -  0.9% ±  0.3%
  instructions       53.1G  ± 32.2      53.1G  … 53.1G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references   60.2M  ± 1.32M     58.9M  … 61.5M           0 ( 0%)          +  0.6% ±  4.1%
  cache_misses       26.7M  ±  131K     26.5M  … 26.8M           0 ( 0%)          -  0.1% ±  1.2%
Benchmark 1 (24 runs): /home/ehaas/source/arocc/zig-out/bin/arocc-baseline -I. -fsyntax-only sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           215ms ±  531us     214ms …  216ms          0 ( 0%)        0%
  peak_rss           62.3MB ± 48.9KB    62.2MB … 62.3MB          0 ( 0%)        0%
  cpu_cycles          794M  ± 1.77M      792M  …  798M           0 ( 0%)        0%
  instructions       1.67G  ± 9.35      1.67G  … 1.67G           1 ( 4%)        0%
  cache_references   2.06M  ± 22.9K     2.03M  … 2.11M           5 (21%)        0%
  cache_misses        544K  ± 7.78K      530K  …  558K           0 ( 0%)        0%
  branch_misses      7.76M  ± 9.45K     7.74M  … 7.78M           0 ( 0%)        0%
Benchmark 2 (24 runs): /home/ehaas/source/arocc/zig-out/bin/arocc-clear-retaining -I. -fsyntax-only sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           212ms ±  548us     211ms …  213ms          0 ( 0%)        ⚡-  1.7% ±  0.1%
  peak_rss           62.3MB ± 59.3KB    62.2MB … 62.4MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          779M  ± 2.05M      775M  …  784M           1 ( 4%)        ⚡-  1.9% ±  0.1%
  instructions       1.67G  ± 2.32      1.67G  … 1.67G           0 ( 0%)          -  0.2% ±  0.0%
  cache_references   2.06M  ± 24.6K     2.02M  … 2.10M           0 ( 0%)          +  0.3% ±  0.7%
  cache_misses        546K  ± 7.24K      536K  …  570K           1 ( 4%)          +  0.5% ±  0.8%
  branch_misses      7.72M  ± 7.19K     7.71M  … 7.74M           0 ( 0%)          -  0.4% ±  0.1%
Benchmark 3 (24 runs): /home/ehaas/source/arocc/zig-out/bin/arocc-small-opt -I. -fsyntax-only sqlite3.c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           213ms ±  604us     212ms …  214ms          0 ( 0%)          -  1.0% ±  0.2%
  peak_rss           62.3MB ± 54.7KB    62.2MB … 62.4MB          0 ( 0%)          +  0.0% ±  0.0%
  cpu_cycles          785M  ± 1.60M      782M  …  788M           0 ( 0%)        ⚡-  1.1% ±  0.1%
  instructions       1.67G  ± 7.53      1.67G  … 1.67G           1 ( 4%)          -  0.0% ±  0.0%
  cache_references   2.05M  ± 22.1K     2.03M  … 2.09M           0 ( 0%)          -  0.1% ±  0.6%
  cache_misses        549K  ± 7.08K      537K  …  562K           0 ( 0%)          +  1.0% ±  0.8%
  branch_misses      7.83M  ± 7.56K     7.82M  … 7.85M           0 ( 0%)          +  1.0% ±  0.1%

@Vexu Vexu merged commit 4e463c9 into Vexu:master Nov 24, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quadratic behavior in Parser
2 participants