-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should I expect this to support "zero-copy" data loading in some way? #1109
Comments
I figured this was too general to address here and that I should think more about exactly where (which APIs) does the fundamental issue stem from. I don't think streams alone are it, although something Streams API may come up with may help alleviate current concerns. |
Your question is absolutely valid, and the Streams API can and should play an important role in this.
The short answer: we're not there yet. Although the specification for readable byte streams has existed for a while, the first implementation has only started shipping very recently with Chrome 89. And right now, they aren't yet integrated into the rest of the Web platform:
I agree that this should work. You should be able to "reserve" a portion of your WASM memory to hold the received data, create a Unfortunately, that doesn't work. I don't know if there's any intention to make this work, or if it's even possible to support this? I suppose things could get complicated very quickly, for example if the WebAssembly memory needs to grow while a readable byte stream is still Right now, the best you can do is allocate a separate (By the way: if you happen to be using Rust for your "streaming to WebAssembly" use case, you may be interested in wasm-streams. 😉 No support for readable byte streams just yet, but I may have a go at it now that they're available in Chrome. 😄) |
As someone who wrote a WebAssembly (WASM) module to process data which in practice may wholly be contained in very large files, I stand in front of a problem where the most practical solution would seem to be embracing the Streams API to avoid having the user agent allocate as much memory as an entire [large] file, instead relying on streams providing the data in successive chunks for WASM code to consume. Copying each chunk from the array buffer returned by reading the chunk, to WASM module memory, would, however, seem unavoidable.
Some point at WASM working group being the one to amend its programming model to facilitate efficient data processing in these kind of cases, like allowing WASM modules access multiple WASM memories (on the horizon for WASM, evidently) or allow the script host invoking WASM to juggle such memories in and out of reach of WASM, including outright constructing memory objects out of existing
ArrayBuffer
buffers and offering these then as memory to WASM code.One counterargument that has been mentioned is that WASM has requirements with regard to alignment and sizing of its memory objects, which go beyond requirements imposed by the user agent on say,
ArrayBuffer
objects, making the aforementioned feature requests impractical.The thing is, I agree with the above counterargument -- I think WASM memory is an object of a class best suited to be controlled from inside the module; after all, relinquishing ownership of its memories brings with it additional complexity for future WASM design and none of the rest of the script host -- meaning JavaScript -- benefits directly, unless it uses WASM.
Where does the Streams API come in here, and why am I bringing this up here?
Well, apart from WASM, it stands to reason that also JavaScript applications would benefit from APIs that use views, as opposed to mandating on returning new
ArrayBuffer
object every time a data loading operation is done.Does the Streams API facilitate this -- loading data into views, to save on copy operations? I am not sure, having learned about "BYOB" readers, it would seem these were the solution here, but why can't I do this then:
Maybe I have understood BYOB in context of streams wrong, but what I think would be beneficial is being able to read data from opaque blobs (among other opaque sources) into much more tangible array buffer that already exists, to save on a future copy operation in the script -- using the
read(view)
of the obtained reader above would be just the thing, wouldn't it? Except it doesn't work -- apparently streams vended by blobs are not "byte streams". Forgive my ignorance, and the spec may have penetrated too deep into practical application here -- but shouldn't above be a perfect use-case for zero-copy loading of file data into memory available to both the script and any WASM module it may run (which could useMemory.prototype.buffer
to make a view on the memory and hand it to a BYOB reader'sread
call)?But perhaps Streams API is the wrong API to make changes or additions to, to make scenarios like above, work?
I've read about a dozen issues related to the same "zero copy" umbrella feature request peeking in through the details (zero copy -- an order of magnitude less overhead), but these either focus on WebAssembly -- as if without it there isn't much need to shift to relying on views, where possible -- or appear to chase a rabbit hole of OOP abstractions since around 2014.
What part do you think this specification will play into shifting an entire portfolio of current approaches which create new array buffer for every data loading operation, into something fundamentally relying on views? We don't even have to necessarily consider multi-threading beyond what it already relies upon -- object transfer. If we can transfer the same buffer between threads to make a safe programming model on the Web, I don't see the complications multiple views on the same buffer add to that?
I hope I am making sense with this -- I guess I am frustrated that there are so many APIs that rely on buffers, yet there is next to nothing to avoid excessive, fundamentally unnecessary copying, and neither WebAssembly nor threads appear in my limited understanding to be standing in the way. Yes, we have
TypedArray.prototype.set
, which is a little gem buried deep in the APIs. The streams API, to my understanding, was motivated by needing better consumption of big data -- to that end, zero copy operations where possible are a continuation of the same direction, so perhaps this is the API to amend?Of course I might as well ask authors of the File API whether they can add a
load(view)
method to theBlob
class, but I honestly don't know which thread is best to pull. Fixing it in one place probably makes fixing it elsewhere unnecessary and saves on work effort.The text was updated successfully, but these errors were encountered: