Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for target_feature atomics on wasm32 #239

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

SarthakSingh31
Copy link

The target_feature atomics enabled multithreading on the wasm32. The devices in wgpu are not Send or Sync when the target_feature atmoics is enabled as they cannot be shared across threads.

This PR enables cubecl to run wgpu on a dedicated thread and then communicate to this thread using channels.

We are trying to use burn to run inference on wasm32 but we need atomics support to do multi threaded CPU bound compute as well.

The target_feature atomics enabled multithread on the wasm32.

The devices in wgpu are not Send or Sync when the target_feature
atmoics is enabled as they cannot be shared across threads.

This PR enables cubecl to run wgpu on a dedicated thread and then
communicate to this thread using channels.
Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general comment is that there is a bit too much feature flags with wasm around the codebase. I'm currently refactoring the wgpu server into components where it would be easier to add a new implementation tailor for wasm.

@ArthurBrussee
Copy link
Contributor

Hiya! If I can barge in as well as I looked into this previously, some thoughts.

  • There's already a client talking (potentially with a channel) to a seperate server. Would it be possible to handle things there rather than bifurcating the server? Having a seperate wgpu-wasm server to maintain would not be ideal. For the memroy management you shouldn't need many changes, the server talks to it but the client doesn't.
  • Can you keep the Arcs? I know they're technically wasteful and clippy warns about them if things aren't Send/Sync, but otherwise public APIs have their types switched based on a flag, which would be very hard to use.
  • Is there a more direct access to web workers than via Rayon? I know they've done some of the hard lifting but pulling in all of rayon for this would be a shame.

@SarthakSingh31
Copy link
Author

SarthakSingh31 commented Nov 13, 2024

@ArthurBrussee Thank you for your review!

I have been thinking more about this and I think a thread local approach would be better than creating a server with channels to it. I am running into a lot of issues in trying to wait for the response from the channel.

I am thinking about creating a single instance of a server per thread per WgpuDevice when it is requested. Then I would create a custom channel implementation to to use the thread local server mutably. The channel is going to be !Send and !Sync so they are always valid for their thread.

The big downside of doing this is that I would have relax the bounds on ComputeServer, ComputeStorage, ComputeStorage::Resource and ComputeChannel to not require Send or Sync. What do you think the implications of this would be on the burn crate? Do they require these traits to have those bounds?

This approach would no longer require Rayon.

@ArthurBrussee
Copy link
Contributor

Hey! Sorry for the late reply, see you're already getting somewhere with this implementation which is great :)

Not having ComputeStorage::Resource be Send (at least on atomic WASM) makes sense, that's quite an inherent limitation until wgpu::Buffers are Send.

Unfortunately though, the compute server read() future really needs to be Send, at least on native platforms. It's quite important to send the future to another thread where it can wait on results while otherwise still running. Maybe that could also only be !Send on atomic WASM, I don't know how annoying that would get.

Last comment - you're relying on blocking on a future which I guess actually works now on a worker which is sick :D However, you still can't do so on the main thread afaik and I'm not sure if there's a good way to detect whether you're running in a worker or not. Still requiring people to manually call the async init function is annoying but might have to be like that. Hopefully also don't need to duplicate the initlialization function then!

Very exciting though, would love to run my stuff in a web-worked and finally not worry about it messing with the UI haha. Though I will need to figure out how to update a canvas from a web worker...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants