-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for device selection #162
Conversation
…nstraints. Introduce MLTensor that allows asynchronous readback.
index.bs
Outdated
// Prefer a compute-specific processor such as a neural engine or NPU. | ||
"compute", | ||
// Prefer a traditional software-based device such as a CPU. | ||
"software" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to naming it as cpu
. It would be a clear device name that the Wasm libs could select to avoid the unnecessary data copying cross devices, like the Support CPU - WebAssembly scenario of the op level execution use case describes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the API name prefix, most API naming conventions generally discourage the use of abbreviations as identifier names. If we were to call it "cpu" we would also need a "gpu" value to be consistent, and then what's for AI accelerators? "npu", "vpu", "tpu"? I think that would start a naming discussion to no end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the "cpu" and "gpu" abbreviations would be well known by developers because they are already widely used by native ML APIs, e.g. NNAPI DeviceTypeCode
defines ANEURALNETWORKS_DEVICE_CPU
and ANEURALNETWORKS_DEVICE_GPU
, ML Compute MLCDeviceType
defines gpu
and cpu
and oneDNN dnnl_engine_kind_t
also defines dnnl_cpu
and dnnl_gpu
.
For AI accelerators, probably we can just call them as "accelerator" (in the context of WebNN). This is also used by native APIs, such as NNAPI DeviceTypeCode
defines ANEURALNETWORKS_DEVICE_ACCELERATOR
for "Dedicated accelerator for Machine Learning workloads.". Another example is OpenCL cl_device_type
defines CL_DEVICE_TYPE_CPU
, CL_DEVICE_TYPE_GPU
and CL_DEVICE_TYPE_ACCELERATOR
for heterogeneous hardware.
Actually, today's modern CPUs already have ML specific instructions, such as Vector Neural Network Instruction (VNNI) of X86 and Scalable Vector Extension 2 (SVE2) of Arm, and are adding even more advanced instructions, e.g., for matrix multiplication. My major concern is "software" seems not to reflect these new capabilities that would accelerate the ML workloads for web apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to naming suggestions if others have input on this. In my line of work designing Windows API at Microsoft, we generally stay away from using abbreviations because its meaning and perceived freshness tends to change over time. It is also harder for people with different background to understand. (Windows API is well known for longevity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents worth: "compute" and "software" sound too generic, and I wouldn't be able to guess what they meant. On the other hand, I do think most people can guess what "cpu" or "gpu" mean. I agree custom neural accelerators don't have as obvious a choice. "npu" or "accelerator" or "custom" sound fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on cpu and gpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any reference or recommendations regarding usage of abbreviations in the Web API definition. Anyone knows where to look? @anssiko. If it is allowed, should it be used in lowercase or uppercase? etc.
DirectX API in general avoids abbreviations, and terms like "graphics" or even "software" have been used for quite some time. e.g.
https://docs.microsoft.com/en-us/windows/win32/direct3d11/atoc-dx-graphics-direct3d-11
https://docs.microsoft.com/en-us/windows/win32/api/d3dcommon/ne-d3dcommon-d3d_driver_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchao1115 Web Platform Design Principles > Naming principles https://w3ctag.github.io/design-principles/#naming-is-hard is probably the best resource. Naming is always a topic of active discussion in API design, and there's often no absolute rights or wrongs.
Given there's a Web API called WebGPU, I suspect "gpu" would pass if you prefer it that way. Similarly, we can probably agree that "cpu" is familar to web developers of today.
That said, I'd challenge the group to think this from the perspective of future-proofing. How do the possible future architectures fit into the device buckets we choose? We could make this extensible with new buckets in the future, or we could make the initial ones abstract enough to accommodate future devices. I think there's always a trade-off whatever we choose. What I think we want to achieve is to be able to provide web developers with knobs they can use to give hints to the implementation so their workloads can be executed in a manner that yields the best user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anssiko. I'll take this advice in the design principles: Please consult widely on names in your APIs.
The hardest is probably the accelerator type. Let's discuss this naming in our next CG call.
index.bs
Outdated
console.log(`values: ${outputs.c.data}`); | ||
const outputs = {'c': new MLTensor(new Float32Array(sizeOfShape([3, 3])))}; | ||
graph.compute(inputs, outputs); | ||
const data = await outputs.c.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to figure out the right behavior of MLTensor
for the pre-allocated outputs scenario. Will outputs.c.data()
allocate and turn a new ArrayBufferView
instead of downloading the result into the pre-allocated ArrayBufferView
? And if user code captures the returned MLTensor
, what's the relationship between the returned MLTensor
and the given MLTensor
? e.g.,
const buffer = new Float32Array(sizeOfShape([3, 3]);
const outputs = {'c': new MLTensor(buffer))};
let returns = graph.compute(inputs, outputs);
const buffer1 = await returns.c.data();
const buffer2 = await outputs.c.data();
// is buffer === buffer1 === buffer2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the outputs are given, I would suggest compute
not to return the results. The calling data()
of the given output MLTensor
should download the results to the pre-allocated ArrayBufferView
and return it (the same object). e.g.,
const preallocatedBuffer = new Float32Array(sizeOfShape([3, 3]);
const outputs = {'c': new MLTensor(preallocatedBuffer))};
graph.compute(inputs, outputs); // compute does not return
const returnedBuffer = await outputs.c.data();
// returnedBuffer === preallocatedBuffer
index.bs
Outdated
console.log(`outputs include ${Object.keys(outputs)}`); | ||
|
||
// Compute d. | ||
outputs = await graph.compute(inputs, {d}); | ||
outputs = graph.compute(inputs, {d}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the optional output scenario, user code only needs to specify the required output names. Previous MLOutput
has all optional fields, so it can be an empty dictionary and not bound to any resources. However with the new MLTensor
, it looks like user code has to create from a resource which is not necessary for this usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in this case {d}
is a dictionary with a single key associated with a null (MLTensor
) value. Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may make mistake for record<DOMString, MLOutput>
case, it should be {'d': {}}
. However, for record<DOMString, MLTensor>
, I am afraid the {'d': null}
is invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the outputs will be returned by compute, probably we can optimize this usage by allowing user to only specify the output names in an array. e.g.,
outputs = graph.compute(inputs, ['d']);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax-wise, are you saying that the following is also invalid?
const d = builder.matmul(a, b);
const e = builder.add(d, c);
--> (?) const graph = await builder.build({d, e});
Can you tell me exactly how record<K,V>
should be used and how should we write the above line in question? I've never found a usage sample for it in the WebIDL spec.
Is it?
await builder.build({'d': d, 'e': e});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> (?) const graph = await builder.build({d, e});
This code is valid. It is equivalent to await builder.build({'d': d, 'e': e});
They are two styles of object properties definition. The former one is so called "shorthand property names" of ES2015.
I said outputs = graph.compute(inputs, {d});
is invalid, because the object d
is a type of Operand
in that example. If d
is a type of MLOutput
, it would be valid.
index.bs
Outdated
MLResource resource(); | ||
|
||
// Asynchronously download the result of a computation onto a typed array. | ||
Promise<ArrayBufferView> data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MLTensor
that downloads the result to ArrayBufferView
probably needs to be separated from ones for GPU resources (say MLGPUTensor
). It is because that the GPU resources have their own data download methods. e.g.,
interface MLTensor {
constructor(ArrayBufferView buffer, optional sequence<long> dimensions);
sequence<long> dimensions();
Promise<ArrayBufferView> data();
};
typedef (MLResourceBufferView or WebGLTexture or GPUTexture) MLGPUResource;
interface MLGPUTensor {
constructor(MLGPUResource resource, optional sequence<long> dimensions);
sequence<long> dimensions();
MLGPUResource resource();
};
According to the table of device types and the resource types:
MLContext
created with any device types can accept and returnMLTensor
,MLContext
created with GPU device can accept and returnMLTensor
, andMLGPUTensor
for gpu interop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably needs to be separated from ones for GPU resources (say MLGPUTensor). It is because that the GPU resources have their own data download methods.
If the tensor has a backing GPU resource, the execution output will be on that backing resource. The await data()
call is only when the user wants to download the data from the GPU resource to a CPU buffer.
If the user wants to make a copy from a GPU resource to another GPU resource, they should do it themselves outside of the WebNN call.
I would love not to fragment the MLTensor
with a GPU-specific version. It's much simpler to think of an MLTensor
as an envelop for a device-specific resource needed at the execution time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tensor has a backing GPU resource, the execution output will be on that backing resource. The
await data()
call is only when the user wants to download the data from the GPU resource to a CPU buffer.
We probably need to coordinate with other Web API, e.g. if the resource is a GPUBuffer
, WebGPU defines mayAsync
and getMappedRange
for the data downloading.
My another question is if the tensor is created with an ArrayBufferView
, what the MLResource resource()
would return? If it returns the ArrayBufferView
, what's the semantic difference from Promise<ArrayBufferView> data()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My another question is if the tensor is created with an ArrayBufferView, what the MLResource resource() would return? If it returns the ArrayBufferView, what's the semantic difference from Promise data()?
resource()
returns the backing resource of the MLTensor
while await data()
asynchronously downloads and layout-converts the content to a standard layout onto a separate buffer. I believe you proposed this semantic for the WASM backend when an operation is executed stand alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to coordinate with other Web API, e.g. if the resource is a GPUBuffer, WebGPU defines mayAsync and getMappedRange for the data downloading.
We define await data()
as a contract to asynchronously download the content to a CPU buffer. How exactly that download is implemented from a GPU resource to a CPU buffer is up to the implementor of this contract.
If the caller of WebNN wants to make a copy from one GPU resource to the other, then that is entirely outside the scope of WebNN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's step back a bit and really think about this requirement for a sec:
The framework may compute multiple ops without accessing the intermediate results. So if we allow passing the output of a compute to the next compute without making the data available, it would help the underlying implementation hide the additional copies and conversions. The idea is illustrated by following pseudo framework integration code:
// assume conv2d_op, add_op, relu_op are compiled single-op graphs
// input = tf.tensor(values);
// conv2d = tf.conv2d(input, filter);
conv2d.mlOutput = conv2d_op.compute({'input': {resource: input.arrayBuffer}).output; // webnn backend
// add = tf.add(conv2d, bias);
add.mlOutput = add_op.compute({'a': conv2d.mlOutput}).output; // webnn backend
// relu = tf.relu(add);
relu.mlOutput = relu_op.compute({'input': add.mlOutput}).output; // webnn backend
// await relu.data();
await relu.mlOutput.readDataAsync({resource: relu.arrayBuffer}); // webnn backend
i.e. if a framework graph [input]->conv1->conv2->[output]
has each of the two conv
operations compiled separately on the backend, we want the result out of conv1
to be an opaque handle with implementation-specific layout format that can later be "fixed" in a final async call (e.g. await relu.data()
in the pseudo code above).
On a second thought, this requirement can't be allowed. No matter where a convolution operation is defined, whether at the framework level or at webnn backend, an output of a public operation must be in a known layout format.
While I understand the motive of potentially reducing copies at the backend, from the API standpoint, it simply cannot be allowed or we'll end up with a spec that says that the layout of an output of an operation is in an unknown hardware-specific format.
This means that for a framework graph [input]->conv1->conv2->[output]
, if a user (of the framework) executes the entire graph in one atomic action (load-and-run), the framework backend must compile the two convolution ops together in order to achieve the optimization that would allow the intermediate result between the two ops to stay in a native format. And that only the final output after conv2
is converted back to a known layout format. This is precisely the value of graph compilation -- to optimize by hiding the intermediate results within the graph that aren't exposed to the user.
For the case where the framework graph is actually exposed to the user and that the user chooses to execute each op in the graph individually (eager execution), the execution result from each op must be produced and converted into a known layout format every step of the way. Yes, it won't be as efficient but that's exactly the trade-off the user is making when using eager execution -- to favor inspection of the output of each operator in the graph over end-to-end graph execution performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether at the framework level or at webnn backend, an output of a public operation must be in a known layout format.
Agree. So the result of MLTensor.data()
is always the standard layout format.
we'll end up with a spec that says that the layout of an output of an operation is in an unknown hardware-specific format.
The internal format is up to the implementation, the spec won't define that.
For the case where the framework graph is actually exposed to the user and that the user chooses to execute each op in the graph individually (eager execution), the execution result from each op must be produced and converted into a known layout format every step of the way.
The conversion would only be needed when user code calls MLTensor.data()
. Otherwise, it can be kept in the internal format and be consumed by the next op compute as an input.
As a summary, if user codes wants to access the output data, call MLTensor.data()
. The result is always in the standard layout format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. So the result of
MLTensor.data()
is always the standard layout format.
Not only that. The output of MLGraphBuilder.conv2d
must also be in standard format. See detail in the conv2d API spec, both the size and the shape of the output operand are clearly defined, because this operation might be the last in the graph, and so its output in that case is the graph's output, which must be defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of
MLGraphBuilder.conv2d
must also be in standard format. See detail in the conv2d API spec, both the size and the shape of the output operand are clearly defined, because this operation might be the last in the graph, and so its output in that case is the graph's output, which must be defined.
Now with MLTensor
as the graph output, the user code can only access the results by calling MLTensor.data()
. So if MLTensor.data()
always returns the standard layout format, it would not break the spec definition.
Did I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of the conv2d
will be in the output buffer that is bound to the graph output in the compute
call. Before this PR, there was the MLOutput
dictionary where the data
field holds the output buffer. In this PR, that would be the resource()
call. This is true whether the output buffer is a GPU resource or an ArrayBufferView. I now think it's probably best to leave the async download step up to the caller, and not tying it to the WebNN API since the caller would know how to deal with data downloads from a resource type they specify. As for the operation itself, the output buffer needs to always be in standard formats to keep the spec well defined.
index.bs
Outdated
|
||
[SecureContext, Exposed=Window] | ||
interface MLGraph { | ||
Promise<MLNamedOutputs> compute(MLNamedInputs inputs, | ||
optional MLNamedOutputs outputs = {}); | ||
MLNamedTensors compute(MLNamedTensors inputs, optional MLNamedTensors outputs = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compute
method probably could be overrode for different device types and different output allocations (newly allocated or pre-allocated), e.g.,
typedef record<DOMString, MLTensor> MLNamedTensors;
typedef record<DOMString, MLGPUTensor> MLNamedGPUTensors;
interface MLGraph {
// MLContext with any device types, allocate output `MLTensor`s for listed output names.
MLNamedTensors compute(MLNamedTensors inputs, optional sequence<DOMString> outputNames);
// MLContext with any device types, use pre-allocated output 'MLTensor`s.
void compute(MLNamedTensors inputs, MLNamedTensors outputs);
// MLContext with GPUDevice or WebGLRenderingContext device type, allocate output `MLGPUTensor`s for listed output names.
MLNamedGPUTensors compute(MLNamedGPUTensors inputs, optional sequence<DOMString> outputNames);
// MLContext with GPUDevice or WebGLRenderingContext device type, use pre-allocated output 'MLTensor`s.
void compute(MLNamedGPUTensors inputs, MLNamedGPUTensors outputs);
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the CPU - WebAssembly scenario of Operation-specific API, the framework could create MLTensor
s with pre-allocated Wasm memory, e.g.,
const inputs = {'x': new MLTensor(input_tensor_info.f32())};
const outputs = {'y': new MLTensor(output_tensor_info.f32_write())};
conv2d.compute(inputs, outputs);
await outputs.y.data()
// The results have been downloaded into the memory pointed by output_tensor_info.f32_write().
For GPU scenario, the framework could create MLGPUTensor
s with WebGLTextures, e.g.,
const inputs = {'x': new MLGPUTensor(inputTextureData.texture)};
const outputs = {'y': new MLGPUTensor(outputTextureData.texture)};
conv2d.compute(inputs, outputs);
// Post-processing with outputTextureData.texture
@wchao1115 @pyu10055 , does it look good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huningxin In WebGL, there is no way to share textures across contexts, is the input textures limited to a GPU context created during compilation or users can specify a context during graph creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pyu10055 the assumption is that the app must create an MLContext with the WebGL context that creates the texture, which later passed back as an input to the graph created by the MLContext. It is generally known that a resource should not be shared across different GPU contexts. In DirectX, you can achieve cross-adapter sharing of resources, but you need to explicitly create a shared resource handle and manage the lifetime of the shared resource explicitly. But at the Web API level, I agree that it is best to make this simpler and simply not allow cross-adapter sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users can specify a context during graph creation?
yes, user can create a MLContext
by specifying a WebGLRenderingContext
, as ML.createContext
MLContext createContext(WebGLRenderingContext glContext);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huningxin I suggest only to have 1 method that requires the caller to pre-allocate the output buffer. It's much simpler that way i.e.
typedef record<DOMString, MLTensor> MLNamedTensors;
interface MLGraph {
// MLContext with any device types, use pre-allocated output 'MLTensor`s.
void compute(MLNamedTensors inputs, MLNamedTensors outputs);
};
I wonder how practical the optional output case (Example 5) really is. For a multi-output topology, not requiring all of the outputs to be produced at execution is uncommon. The optimization gained from not having to specify an optional output is going to be minimal relative to the cost of executing the entire topology anyway. We should probably remove this use case and simplify the API.
Also, there is no need to special-case the GPU resources tensor as the MLTensor
already wraps those types nicely. See the 'Device Selection` section for more detail as to when different type of resource is allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. And this would still work, right?
const d = builder.matmul(a, b);
const e = builder.add(d, c);
const graph = await builder.build({d, e});
const output = {'d': new MLTensor(new Float32Array(sizeOfShape([3, 3])))};
graph.compute(inputs, output);
let data = await output.d.data();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right.
index.bs
Outdated
// Prefer a compute-specific processor such as a neural engine or NPU. | ||
"compute", | ||
// Prefer a traditional software-based device such as a CPU. | ||
"software" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on cpu and gpu
index.bs
Outdated
constructor(MLResource resource, optional sequence<long> dimensions); | ||
|
||
// The tensor dimensions | ||
sequence<long> dimensions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should tensor also have dtype method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to add data type, probably reuse the MLOperandType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that wouldn't be needed. Remember that MLTensor
is just a resource holder type. The data type of the resource is already specified when an operand is created either explicitly through the MLOperandDescriptor
or implicitly through the execution of the relevant operation in the graph.
The fact that there is an optional dimensions
param in its ctor may have confused the matter, but it's only there to support the graph with free dimensions input where the input shape isn't known until the graph execution time. In a normal use case, it won't be needed, hence optional.
This conversation makes me realize that calling the type MLTensor
is itself misleading because when you call something a tensor, you would naturally think that it has both a shape and a data type association. A more appropriate name is probably MLResource
or MLGraphResource
.
explainer.md
Outdated
const outputs = await graph.compute(inputs); | ||
const inputs = {'A': new MLTensor(bufferA), 'B': new MLTensor(bufferB)}; | ||
const outputs = graph.compute(inputs); | ||
const data = await outputs.C.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the data() API is for download to the cpu, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it is, the returned promise resolves with an ArrayBufferView
.
index.bs
Outdated
<!-- Validate inputs and outputs --> | ||
1. If any of the following requirements are unmet, then [=reject=] |promise| with a {{TypeError}} and stop. | ||
1. If any of the following requirements are unmet, then stop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to signal to the web developer the executing has stopped?
There's a leftover promise reference on L1801 that should be either removed or repurposed for signaling this, this one:
reject promise with an OperationError and stop.
index.bs
Outdated
1. Remove |outputName| from |remainingOutputNames|. | ||
1. If |remainingOutputNames| is empty, then resolve |promise| with |results| and stop. | ||
1. If |remainingOutputNames| is empty, then stop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think L1828 should be updated to either:
- Return |results|.
Or remove this step if the preferred design is to use data()
for data download per #162 (comment)
index.bs
Outdated
1. Let |i| be 0. | ||
1. While true: | ||
1. Let |dimension| be |value|.{{MLInput/dimensions}}[|i|]. | ||
1. Let |dimension| be |value|.{{MLTensor/dimensions}}[|i|]. | ||
1. |dimension| must be greater than 0. | ||
1. If |inputOperand|.{{MLOperandDescriptor/dimensions}}[|i|] is greater than 0, then |dimension| must be equal to |inputOperand|.{{MLOperandDescriptor/dimensions}}[|i|]. | ||
1. Set |i| to |i| + 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a drive-by comment, many web specs tend to use a more prose-like form:
Increment |i| by 1.
And in many cases if it makes the algorithm more readable, instead of "i" they may say "foobar counter" instead. Not sure if that applies here, just noting in general in web spec algorithms it is recommended to be self-documenting rather than concise. That said, "i" is pretty widely understood :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, thanks! Acknowledged, will update the compute algorithm once this PR merged.
From today's call:
|
Based on the resolution in the CG call, I've pulled back on the asynchronous data download call and only left the change with the addition of the device selection note. Also, based on our discussion, we will leave the accelerator device preference option out of the spec for now since the resource models around different kinds of accelerators is still not well defined. We'll deal with these issues separately from this PR. Please have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wchao1115 !
index.bs
Outdated
|
||
An {{MLContext}} interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. An {{MLContext}} could be created from a specific GPU device such as {{GPUDevice}} or {{WebGLRenderingContext}} that is already in use by the application, in which case the corresponding {{GPUBuffer}} or {{WebGLBuffer}} resources used as graph constants, as well as the {{GPUTexture}} and {{WebGLTexture}} as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for {{MLContext}} must be created from the same adapter as the device used to allocate the resources referenced in the graph. | ||
|
||
In a situation when a GPU context executes a graph with constants or inputs given as {{ArrayBufferView}}, the content of such constant or input is automatically uploaded from the system memory to the GPU memory. Likewise, the content of the execution result is downloaded to the system memory as {{ArrayBufferView}} upon requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we info the two aspects of a CPU context when executing a graph with ArrayBufferView
:
- no data moving across device
- may still have layout format conversions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks much @wchao1115 !
@RafaelCintron
Preview | Diff