Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsRef<[A]> instead of Vec<A> #1474

Open
not-fiore opened this issue Jan 31, 2025 · 5 comments
Open

AsRef<[A]> instead of Vec<A> #1474

not-fiore opened this issue Jan 31, 2025 · 5 comments

Comments

@not-fiore
Copy link

ArrayBase::from_shape_vec() is very much not ergonomic. would it be possible to have a function that takes a generic AsRef<[A]> instead? being able to pass in a Box ed array or an array owned by an Rc would be much better for both performance and ease of use. also, Vec would still work the same way.

@akern40
Copy link
Collaborator

akern40 commented Feb 1, 2025

I believe that the reason for this limitation is that from_shape_vec specifically makes use of the fact that the Vec is passed by-value, allowing the array to be created without copy by stealing its allocation. An AsRef<[A]> does not provide the same sort of allocated by-value to steal from. Would ::from_iter be sufficient for your purposes?

@not-fiore
Copy link
Author

do you mean calling ::from_iter() myself from outside the function? sorry, i dont understand your proposition here

also, i am not specifically looking for something that would fit my purposes, im just trying to get an understanding for why a more flexible approach wasn't taken

generics as input in functions are a great idea in api design in general, allowing for more flexible use, and not requiring the user to construct a specific data structure when all that is needed is a specific behaviour of that data structure: for example, using a Vec<T> in a context where there is no need for modifying the structure in place is inefficient, and it might be better to pass something that implements that behaviour generically.

in this case, i tried to go through the code and i dont really understand why a Vec is specifically needed. could you be a bit more specific about it? (im sorry if this seems just annoying)

for example, if all that is needed is something that can iterate, the function can accept a impl IntoIterator<T>. i guess i just couldnt find the specific spot in the code where a Vec would be absolutely needed for this, thats all

@akern40
Copy link
Collaborator

akern40 commented Feb 2, 2025

Ya sorry, I should've been more specific, I was suggesting you call Array::from_iter(_) instead of calling ::from_shape_vec, not inside it.

Happy to elaborate a bit: the goal for this function is to go from some kind of owned storage without copy to a new ndarray. The relevant call sites are as follows from_shape_vec --> from_shape_vec_impl --> from_vec_dim_stride_unchecked --> DataOwned::new --> OwnedRepr::from. In that function (data_repr.rs, line 35) you will find that we ManuallyDrop the Vec, then grab a pointer to its data and form the OwnedRepr. So in order to create a more generic version, we'd need a trait that guarantees the following:

  1. The implementing type provides heap-allocated data...
  2. ... that is contiguous in memory...
  3. ... with unique, unshared ownership...
  4. ... that can tell us the length and capacity of the allocation.

I just don't think there's any trait that tells us all of this information1. I suppose there are other corner cases where Rust has created heap-allocated contiguous memory - a Boxed array, for example (but not an Rc array) - but Vec is far and away the main method of doing so, especially for large amounts of data.

(Another piece of historical information: ndarray used to actually keep a Vec under the hood. We got rid of that, since it turns out that Vecs have some compiler magic to have unique pointers to their data, and we didn't want to muck with that. But it's fair to say that Vec is still a sort of zero-th-class type in the codebase of ndarray, and I think it probably has to remain that way.)

Footnotes

  1. If you find one, tell us!

@not-fiore
Copy link
Author

this is very interesting, i thought all types that implement AsRef<[T]> stored a contiguous array on the heap, including Rc<[T]>, but i must be wrong i suppose!

ill look more into this as soon as i can, i suppose youre right tho, this might be the right tradeoff

@robertknight
Copy link

robertknight commented Feb 23, 2025

this is very interesting, i thought all types that implement AsRef<[T]> stored a contiguous array on the heap, including Rc<[T]>, but i must be wrong i suppose!

Static arrays ([T; N]) also implement AsRef<[T]> and these can be stored on the stack or baked into the binary as read-only data using include_bytes!. The AsRef<[T]> means "this type can be borrowed as a slice of T" (ie. a pointer to T and the length). It doesn't say anything about where the T is stored.

A slightly more general version of from_shape_vec could perhaps be achieved by using v: Into<Vec<T>>, since eg. both Vec<T> and Box<[T]> implement this. It does make the type signature more complex however and that would make it less ergonomic to use APIs like collect to build v. I would probably choose not to make this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants