-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type-preserving alternative to read_block #40
Comments
In principle alternative array representations should support the same operations as base R matrices. There may be corner cases, but I think this is a reasonable proposal. |
@LTLA Hi Aaron, I don't think it's realistic to imagine that The purpose of Handling remote matrices with operations executed server-side is IMO a different story. The way I see it it would bypass block-processing so has not much to do with H. Proposal: What I had in mind for a long time and you can already use for the current DelayedMatrix multiplication code is to support 2 modes of block processing: the dense mode (via
However everything you need for a |
Well, I guess I don't have a non-sparse example of needing a block in the native structure, so your proposal does cover the use case for now. It seems that it should just be a simple case of splitting the current |
Something like that. For There is also the question of maybe using a grid with bigger blocks when |
Following up on the painful conversation of PR #64, I'm open to adding an argument to
instead of:
So a light convenience only. With maybe also support for In any case, I don't want to touch Time to close this one too. |
Thanks for thinking more about this, Herve. The second case would be preferable if there were just a function that just returned the appropriate reader function, |
Maybe I should clarify the meaning of (Maybe it's not too crazy to imagine a backend that uses a mix of sparse/dense chunks depending on the content of each chunk but (1) I've not seen any so far so I wouldn't consider this a solid use case and (2) blocks are typically made of many chunks so even if you have an easy/cheap way to know what the chunks are like there is no clear notion of native representation at the block level.) So let me reformulate the proposal in a very concrete way:
Would we still need a function that just returned the appropriate reader function? Or a variant of |
One first easy simplification is to get rid of the The new |
Done (commit c68116a). |
This request is mainly motivated by the current
DelayedMatrix
matrix multiplication code, which does aread_block
call to obtain a dense matrix for actual multiplication. This prevents us from taking advantage of underlying features of the matrix representation that might enable faster multiplication with native methods, e.g.,dgCMatrix
withMatrix::%*%
. I guess this touches on some stuff thatSparseArraySeed
was designed for, but in a more general fashion (e.g., to handle remote matrices with operations executed server-side).This would be resolved by having a
read_block
alternative that tries to preserve the underlying nature of the matrix. The easiest way to do this is to allowextract_array
to return something other than a dense ordinary matrix, e.g., by passing something likedense=FALSE
toextract_array
. I know that an arbitrary matrix representation may not support various delayed operations, whereas a dense matrix always will. In such cases, one could useselectMethod()
to check whether the delayed operation dispatch is possible for the matrix representation; execute it if it is; and convert it into a dense matrix if it isn't.Does this sound reasonable, or am I missing something?
The text was updated successfully, but these errors were encountered: