Question about expectation on cross-hardware #132

fcharras · 2023-12-06T15:32:57Z

From the dlpack standard reference, should this snippet work ?

import torch
import numpy as np

x = torch.asarray([1,2,3], device="cuda")
np.from_dlpack(x)
np.asarray(x)

it currently fails with:

RuntimeError: Unsupported device in DLTensor.

the purpose excerpt says:

Designed for cross hardware: CPU, CUDA, OpenCL, Vulkan, Metal, VPI, ROCm, WebGPU, Hexagon

but also

Whle still considering the need for cross hardware support (e.g. the data field is opaque for platforms that does not support normal addressing).

So I'd like to clarify if the error in this snippet should be reported as a bug to the maintainers, or if it is expected that the dlpack data exchange can only work on a same device.

The text was updated successfully, but these errors were encountered:

rgommers · 2023-12-12T19:26:16Z

That snippet is not expected to work. DLPack is a zero-copy interop protocol describing a memory layout, so CUDA -> CPU won't work. The "cross hardware" description means that it supports a wide range of hardware, however the producer and consumer must be able to share memory.

fcharras · 2023-12-13T09:12:15Z

Thanks for the clarification. There is some discussion about that ongoing at data-apis/array-api#626 .

There is some confusion coming from the Array API spec for from_dlpack, it exhibits a note that says:

The returned array may be either a copy or a view. See Data interchange mechanisms for details.

which then is misleading at the moment, if the returned array cannot be a copy.

rgommers · 2023-12-13T09:30:34Z

Ah okay, I see the confusion. The rationale for that is given as "Zero-copy semantics where possible, making a copy only if needed (e.g. when data is not contiguous in memory).". So you are right, it's not always zero-copy.

We need to distinguish a couple of things here:

DLPack itself (C-level): this only concerns sharing a contiguous in-memory representation, always zero-copy
Python-level APIs from array/tensor libraries:
- These may make a copy if and only if the data isn't already in a layout that DLPack can handle,
- Making a copy in memory is not the same as a data transfer between devices. The latter is much more expensive - and therefore a transfer is never done implicitly in the array API standard, nor in most libraries with multi-device support like PyTorch. IIRC TensorFlow does do it, however it tends to be a performance footgun.

fcharras · 2024-02-15T11:10:34Z

Many thanks !

fcharras mentioned this issue Dec 6, 2023

ENH Use Array API in r2_score scikit-learn/scikit-learn#27904

Merged

leofang mentioned this issue Dec 12, 2023

to_device() -- any way to force back to host "portably?" data-apis/array-api#626

Closed

leofang mentioned this issue Feb 7, 2024

Add a new bit mask for copy + Update Python array API spec #136

Merged

tqchen closed this as completed in #136 Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about expectation on cross-hardware #132

Question about expectation on cross-hardware #132

fcharras commented Dec 6, 2023 •

edited

Loading

rgommers commented Dec 12, 2023

fcharras commented Dec 13, 2023 •

edited

Loading

rgommers commented Dec 13, 2023

fcharras commented Feb 15, 2024

Question about expectation on cross-hardware #132

Question about expectation on cross-hardware #132

Comments

fcharras commented Dec 6, 2023 • edited Loading

rgommers commented Dec 12, 2023

fcharras commented Dec 13, 2023 • edited Loading

rgommers commented Dec 13, 2023

fcharras commented Feb 15, 2024

fcharras commented Dec 6, 2023 •

edited

Loading

fcharras commented Dec 13, 2023 •

edited

Loading