Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about expectation on cross-hardware #132

Closed
fcharras opened this issue Dec 6, 2023 · 4 comments · Fixed by #136
Closed

Question about expectation on cross-hardware #132

fcharras opened this issue Dec 6, 2023 · 4 comments · Fixed by #136

Comments

@fcharras
Copy link

fcharras commented Dec 6, 2023

From the dlpack standard reference, should this snippet work ?

import torch
import numpy as np

x = torch.asarray([1,2,3], device="cuda")
np.from_dlpack(x)
np.asarray(x)

it currently fails with:

RuntimeError: Unsupported device in DLTensor.

the purpose excerpt says:

Designed for cross hardware: CPU, CUDA, OpenCL, Vulkan, Metal, VPI, ROCm, WebGPU, Hexagon

but also

Whle still considering the need for cross hardware support (e.g. the data field is opaque for platforms that does not support normal addressing).

So I'd like to clarify if the error in this snippet should be reported as a bug to the maintainers, or if it is expected that the dlpack data exchange can only work on a same device.

@rgommers
Copy link
Collaborator

That snippet is not expected to work. DLPack is a zero-copy interop protocol describing a memory layout, so CUDA -> CPU won't work. The "cross hardware" description means that it supports a wide range of hardware, however the producer and consumer must be able to share memory.

@fcharras
Copy link
Author

fcharras commented Dec 13, 2023

Thanks for the clarification. There is some discussion about that ongoing at data-apis/array-api#626 .

There is some confusion coming from the Array API spec for from_dlpack, it exhibits a note that says:

The returned array may be either a copy or a view. See Data interchange mechanisms for details.

which then is misleading at the moment, if the returned array cannot be a copy.

@rgommers
Copy link
Collaborator

Ah okay, I see the confusion. The rationale for that is given as "Zero-copy semantics where possible, making a copy only if needed (e.g. when data is not contiguous in memory).". So you are right, it's not always zero-copy.

We need to distinguish a couple of things here:

  1. DLPack itself (C-level): this only concerns sharing a contiguous in-memory representation, always zero-copy
  2. Python-level APIs from array/tensor libraries:
    • These may make a copy if and only if the data isn't already in a layout that DLPack can handle,
    • Making a copy in memory is not the same as a data transfer between devices. The latter is much more expensive - and therefore a transfer is never done implicitly in the array API standard, nor in most libraries with multi-device support like PyTorch. IIRC TensorFlow does do it, however it tends to be a performance footgun.

@fcharras
Copy link
Author

Many thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants