Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison with Spice #1188

Open
mo8it opened this issue Aug 12, 2024 · 5 comments
Open

Comparison with Spice #1188

mo8it opened this issue Aug 12, 2024 · 5 comments

Comments

@mo8it
Copy link

mo8it commented Aug 12, 2024

See the README in https://github.com/judofyr/spice

Why is the overhead of Rayon that huge? Are there any plans to reduce it?

@cuviper
Copy link
Member

cuviper commented Aug 12, 2024

This micro-benchmark is kind of a worst case for rayon -- maximally splitting into jobs with very little actual work involved in each. Some quick perf analysis shows that we're spending most time between crossbeam deques and our sleep code, which isn't surprising to me.

A possibly nicer implementation of that could use rayon::iter::split, so it can avoid recursing quite so deeply. That is more complicated to write though, and I fear it's still not going to be favorable in the small input case.

We don't have current plans to work on the overhead, but of course I'd love to see any improvements that folks can come up with. I wonder if it would be possible to even rewrite rayon-core internals with that heartbeat scheduling approach.

@cuviper
Copy link
Member

cuviper commented Sep 19, 2024

I just saw this Spice port on HN: https://github.com/dragostis/chili

@kdy1
Copy link

kdy1 commented Jan 3, 2025

I tried chili in swc-project/swc#9829 but it was way slower than rayon.
스크린샷 2025-01-03 오후 7 01 07

스크린샷 2025-01-03 오후 7 00 49

My guess for the cause is

  • I used Scope::global() for each level of visit_par, but it seems to access global variable unconditionally.
  • some of visit_par is still too slow for chili? Just a guess, but as my usecase is AST, high-level nodes still has long runtime.
  • Unlike spice, chili seems to split based on the number of spawned tasks, and I spawned one task per node. Maybe that's the core cause, but not sure.

cc @dragostis for visibility

@dragostis
Copy link

@kdy1, my initial hunch is that there might be contention on the Scope::global lock. The current implementation relies on Scope::global not being called in a hot loop.

The heartbeat algorithm relies on the context's lock being acquired not too often, i.e. when the heartbeat happens. That's the moment when workers exchange work. Apart from that, the only contention is on one relaxed atomic that happens here, but this isn't the issue according to your stack.

@kdy1
Copy link

kdy1 commented Jan 6, 2025

@dragostis Yeap, that was the problem. Thank you so much for a such great library!

I managed to get chili to work (using unsafe + transmute), and it seems to have near-zero overhead. The problem is that API is quite limited compared to rayon, but it's a resolvable issue.

I'll try this trick to optimize the ES linter and minifier after trying to create a safe interface for chili.
FYI, if I use rayon, _cvwait takes 10% of the total CPU, compared to 6.6% of _cvwait + 1.1% of semaphore_wait_trap.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants