Added support for target_feature atomics on wasm32 #239

SarthakSingh31 · 2024-11-08T08:29:41Z

The target_feature atomics enabled multithreading on the wasm32. The devices in wgpu are not Send or Sync when the target_feature atmoics is enabled as they cannot be shared across threads.

This PR enables cubecl to run wgpu on a dedicated thread and then communicate to this thread using channels.

We are trying to use burn to run inference on wasm32 but we need atomics support to do multi threaded CPU bound compute as well.

The target_feature atomics enabled multithread on the wasm32. The devices in wgpu are not Send or Sync when the target_feature atmoics is enabled as they cannot be shared across threads. This PR enables cubecl to run wgpu on a dedicated thread and then communicate to this thread using channels.

…ache` is disabled

nathanielsimard

My general comment is that there is a bit too much feature flags with wasm around the codebase. I'm currently refactoring the wgpu server into components where it would be easier to add a new implementation tailor for wasm.

ArthurBrussee · 2024-11-12T16:33:33Z

Hiya! If I can barge in as well as I looked into this previously, some thoughts.

There's already a client talking (potentially with a channel) to a seperate server. Would it be possible to handle things there rather than bifurcating the server? Having a seperate wgpu-wasm server to maintain would not be ideal. For the memroy management you shouldn't need many changes, the server talks to it but the client doesn't.
Can you keep the Arcs? I know they're technically wasteful and clippy warns about them if things aren't Send/Sync, but otherwise public APIs have their types switched based on a flag, which would be very hard to use.
Is there a more direct access to web workers than via Rayon? I know they've done some of the hard lifting but pulling in all of rayon for this would be a shame.

SarthakSingh31 · 2024-11-13T08:57:49Z

@ArthurBrussee Thank you for your review!

I have been thinking more about this and I think a thread local approach would be better than creating a server with channels to it. I am running into a lot of issues in trying to wait for the response from the channel.

I am thinking about creating a single instance of a server per thread per WgpuDevice when it is requested. Then I would create a custom channel implementation to to use the thread local server mutably. The channel is going to be !Send and !Sync so they are always valid for their thread.

The big downside of doing this is that I would have relax the bounds on ComputeServer, ComputeStorage, ComputeStorage::Resource and ComputeChannel to not require Send or Sync. What do you think the implications of this would be on the burn crate? Do they require these traits to have those bounds?

This approach would no longer require Rayon.

ArthurBrussee · 2024-11-14T14:13:48Z

Hey! Sorry for the late reply, see you're already getting somewhere with this implementation which is great :)

Not having ComputeStorage::Resource be Send (at least on atomic WASM) makes sense, that's quite an inherent limitation until wgpu::Buffers are Send.

Unfortunately though, the compute server read() future really needs to be Send, at least on native platforms. It's quite important to send the future to another thread where it can wait on results while otherwise still running. Maybe that could also only be !Send on atomic WASM, I don't know how annoying that would get.

Last comment - you're relying on blocking on a future which I guess actually works now on a worker which is sick :D However, you still can't do so on the main thread afaik and I'm not sure if there's a good way to detect whether you're running in a worker or not. Still requiring people to manually call the async init function is annoying but might have to be like that. Hopefully also don't need to duplicate the initlialization function then!

Very exciting though, would love to run my stuff in a web-worked and finally not worry about it messing with the UI haha. Though I will need to figure out how to update a canvas from a web worker...

SarthakSingh31 added 2 commits November 8, 2024 13:54

Fixed use of TuneCacheResult::Unchecked when `autotune_persistent_c…

afbf438

…ache` is disabled

nathanielsimard reviewed Nov 11, 2024

View reviewed changes

ArthurBrussee mentioned this pull request Nov 12, 2024

Fix autotune compilation on WASM #249

Merged

Switched to futures block_on to wait for response from the server

388a9a3

SarthakSingh31 added 5 commits November 14, 2024 11:35

Switched to thread local runtime

feab17f

Added back the send bound

cfe243e

Added sync

33484db

Switched to futures_lite for blocking

d1173f3

Fixed wasm32 non atomic

5412ef0

Make a async init function required for each thread

6172270

SarthakSingh31 force-pushed the sar/wasm-mulithreading branch from b695385 to 6172270 Compare November 15, 2024 12:05

Added check so that we don't re init the server

7a0c979

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for target_feature atomics on wasm32 #239

Added support for target_feature atomics on wasm32 #239

SarthakSingh31 commented Nov 8, 2024

nathanielsimard left a comment

ArthurBrussee commented Nov 12, 2024

SarthakSingh31 commented Nov 13, 2024 •

edited

Loading

ArthurBrussee commented Nov 14, 2024

Added support for target_feature atomics on wasm32 #239

Are you sure you want to change the base?

Added support for target_feature atomics on wasm32 #239

Conversation

SarthakSingh31 commented Nov 8, 2024

nathanielsimard left a comment

Choose a reason for hiding this comment

ArthurBrussee commented Nov 12, 2024

SarthakSingh31 commented Nov 13, 2024 • edited Loading

ArthurBrussee commented Nov 14, 2024

SarthakSingh31 commented Nov 13, 2024 •

edited

Loading