Comparison with Spice #1188

mo8it · 2024-08-12T13:04:21Z

See the README in https://github.com/judofyr/spice

Why is the overhead of Rayon that huge? Are there any plans to reduce it?

cuviper · 2024-08-12T20:11:28Z

This micro-benchmark is kind of a worst case for rayon -- maximally splitting into jobs with very little actual work involved in each. Some quick perf analysis shows that we're spending most time between crossbeam deques and our sleep code, which isn't surprising to me.

A possibly nicer implementation of that could use rayon::iter::split, so it can avoid recursing quite so deeply. That is more complicated to write though, and I fear it's still not going to be favorable in the small input case.

We don't have current plans to work on the overhead, but of course I'd love to see any improvements that folks can come up with. I wonder if it would be possible to even rewrite rayon-core internals with that heartbeat scheduling approach.

cuviper · 2024-09-19T19:20:05Z

I just saw this Spice port on HN: https://github.com/dragostis/chili

kdy1 · 2025-01-03T10:41:29Z

I tried chili in swc-project/swc#9829 but it was way slower than rayon.

My guess for the cause is

I used Scope::global() for each level of visit_par, but it seems to access global variable unconditionally.
some of visit_par is still too slow for chili? Just a guess, but as my usecase is AST, high-level nodes still has long runtime.
Unlike spice, chili seems to split based on the number of spawned tasks, and I spawned one task per node. Maybe that's the core cause, but not sure.

cc @dragostis for visibility

dragostis · 2025-01-03T14:09:15Z

@kdy1, my initial hunch is that there might be contention on the Scope::global lock. The current implementation relies on Scope::global not being called in a hot loop.

The heartbeat algorithm relies on the context's lock being acquired not too often, i.e. when the heartbeat happens. That's the moment when workers exchange work. Apart from that, the only contention is on one relaxed atomic that happens here, but this isn't the issue according to your stack.

kdy1 · 2025-01-06T04:59:57Z

@dragostis Yeap, that was the problem. Thank you so much for a such great library!

I managed to get chili to work (using unsafe + transmute), and it seems to have near-zero overhead. The problem is that API is quite limited compared to rayon, but it's a resolvable issue.

I'll try this trick to optimize the ES linter and minifier after trying to create a safe interface for chili.
FYI, if I use rayon, _cvwait takes 10% of the total CPU, compared to 6.6% of _cvwait + 1.1% of semaphore_wait_trap.

cuviper mentioned this issue Sep 19, 2024

Best way to swap out Rayon for chili dragostis/chili#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison with Spice #1188

Comparison with Spice #1188

mo8it commented Aug 12, 2024

cuviper commented Aug 12, 2024

cuviper commented Sep 19, 2024

kdy1 commented Jan 3, 2025 •

edited

Loading

dragostis commented Jan 3, 2025

kdy1 commented Jan 6, 2025 •

edited

Loading

Comparison with Spice #1188

Comparison with Spice #1188

Comments

mo8it commented Aug 12, 2024

cuviper commented Aug 12, 2024

cuviper commented Sep 19, 2024

kdy1 commented Jan 3, 2025 • edited Loading

dragostis commented Jan 3, 2025

kdy1 commented Jan 6, 2025 • edited Loading

kdy1 commented Jan 3, 2025 •

edited

Loading

kdy1 commented Jan 6, 2025 •

edited

Loading