Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc performance tweaks #4400

Merged
merged 2 commits into from
Oct 25, 2024
Merged

gc performance tweaks #4400

merged 2 commits into from
Oct 25, 2024

Conversation

dgryski
Copy link
Member

@dgryski dgryski commented Aug 8, 2024

No description provided.

@dgryski dgryski force-pushed the dgryski/gc-perf-tweaks branch from 926b918 to 4a7376b Compare September 17, 2024 19:10
@dgryski dgryski marked this pull request as ready for review September 17, 2024 19:10
@deadprogram deadprogram requested a review from aykevl September 19, 2024 08:18
@dgryski
Copy link
Member Author

dgryski commented Sep 20, 2024

Decided to test a GC benchmark I was using to test the findHead optimization, and these changes turn out to be a regression :(

   4.2  +
        |
        |                                        |
        |                                        |
   4.0  +                                        |
        |                              +-------------------+
        |                              |         *         |
   3.8  +                              +-------------------+
        |                                        |
        |
        |
   3.6  +                   *
        |
        |         +---------*---------+
   3.4  +         +-------------------+
        |
        |
   3.2  +--------------------------------------------------------------
                        old.times            new.times

File       N   Mean  Stddev
old.times  20  3.45  0.05    (control)
new.times  20  3.86  0.09    (3.86 > 3.45 ± 0.05, p = .000)

@dgryski dgryski force-pushed the dgryski/gc-perf-tweaks branch from 9d9b614 to 229d5b7 Compare September 20, 2024 23:16
@dgryski
Copy link
Member Author

dgryski commented Sep 20, 2024

I think the culprit is e288226 which my guess is since we're moving the nextAlloc pointer back to the start after a GC, means we have more memory to shuffle through to find free space. I think what we end up saving is heap growth. I might pull that commit out until we have a better grasp on the tradeoffs.

Every time we overflow the stack, we have to do a full rescan of the heap.  Making this larger
means fewer overflows and thus fewer secondary+ heap scans.
@dgryski dgryski force-pushed the dgryski/gc-perf-tweaks branch from 229d5b7 to 8fff286 Compare October 24, 2024 18:02
@dgryski
Copy link
Member Author

dgryski commented Oct 24, 2024

With the layout fix in, this seems to be making things universally slower. Going to see if any of these commits are worth keeping, especially with the findHead optimization coming.

@dgryski
Copy link
Member Author

dgryski commented Oct 24, 2024

A bit more investigation shows that the first commit to increase the stack space helps with garbage collection. The next two are a bit trickier. The first one (that adds the maxAlloc block that says where to stop sweeping) slows things down, probably because there's just additional bookkeeping without making things faster. The next commit that restarts the nextAlloc pointer back at the start when we run a garbage collection helps, but only if we have the previous commit. Otherwise the allocator keeps handing out memory and so we never have a clean space at the end of the heap that we gain any time from not scanning. It's only when we keep trying to hand out early blocks that we gain anything, but (again, in my benchmarking for tinyjsonbench) the last two commits don't make the gc faster than just having the first commit that increases the stack space. So ... possibly there is another benchmark we can play with, but at the moment the json one isn't it.

@dgryski
Copy link
Member Author

dgryski commented Oct 24, 2024

Another gc heavy benchmark: the binarytrees program from the benchmarks game.

~/go/src/github.com/dgryski/trifles/binarytrees $ for i in stack-8.wasm all-tweaks.wasm dev.wasm; do echo $i; time wasmtime $i 18; done
stack-8.wasm
stretch tree of depth 19	 check: 1048575
262144	 trees of depth 4	 check: 8126464
65536	 trees of depth 6	 check: 8323072
16384	 trees of depth 8	 check: 8372224
4096	 trees of depth 10	 check: 8384512
1024	 trees of depth 12	 check: 8387584
256	 trees of depth 14	 check: 8388352
64	 trees of depth 16	 check: 8388544
16	 trees of depth 18	 check: 8388592
long lived tree of depth 18	 check: 524287

real	0m10.961s
user	0m11.447s
sys	0m0.081s
all-tweaks.wasm
stretch tree of depth 19	 check: 1048575
262144	 trees of depth 4	 check: 8126464
65536	 trees of depth 6	 check: 8323072
16384	 trees of depth 8	 check: 8372224
4096	 trees of depth 10	 check: 8384512
1024	 trees of depth 12	 check: 8387584
256	 trees of depth 14	 check: 8388352
64	 trees of depth 16	 check: 8388544
16	 trees of depth 18	 check: 8388592
long lived tree of depth 18	 check: 524287

real	0m21.512s
user	0m21.950s
sys	0m0.095s
dev.wasm
stretch tree of depth 19	 check: 1048575
262144	 trees of depth 4	 check: 8126464
65536	 trees of depth 6	 check: 8323072
16384	 trees of depth 8	 check: 8372224
4096	 trees of depth 10	 check: 8384512
1024	 trees of depth 12	 check: 8387584
256	 trees of depth 14	 check: 8388352
64	 trees of depth 16	 check: 8388544
16	 trees of depth 18	 check: 8388592
long lived tree of depth 18	 check: 524287

real	0m14.288s
user	0m14.661s
sys	0m0.086s

So stack-8 is just the commit that increases the stack depth (which is helpful here for the heavily recursive data structures), all-tweaks is all the commits on this PR, and dev is the current dev branch. I think it's clear that the only one we should consider is the change to increase the stack depth. So that's what I'll trim this down to.

@dgryski dgryski force-pushed the dgryski/gc-perf-tweaks branch from 8fff286 to 84b9ec3 Compare October 24, 2024 21:42
@dgryski
Copy link
Member Author

dgryski commented Oct 24, 2024

PTAL

@aykevl
Copy link
Member

aykevl commented Oct 25, 2024

This changes the stack size used by the stack variable from 64 bytes (or 256 bytes on 64-bit) to 128 bytes (or 512 bytes on 64-bit).
I think that's fine, and the performance benefits are certainly nice. But the increased stack space could be a regression on small systems that don't have a lot of stack space to begin with.

@aykevl aykevl merged commit 9a6397b into tinygo-org:dev Oct 25, 2024
17 checks passed
@dgryski
Copy link
Member Author

dgryski commented Oct 25, 2024

We could make this depend on sizeof(uintptr) for smaller systems.

@aykevl
Copy link
Member

aykevl commented Oct 25, 2024

We could make this depend on sizeof(uintptr) for smaller systems.

sizeof(uintptr) is the same for WebAssembly and most microcontrollers: namely 4 (32-bit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants