-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFusion TableProvider for memory arrays #384
Conversation
vortex-datafusion/src/lib.rs
Outdated
vortex_bail!(InvalidArgument: "only DType::Struct arrays can produce a table provider"); | ||
} | ||
|
||
let arrow_schema = Schema::try_from(array.dtype())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually removed this conversion since it is kind of lossy. You can't tell just from a Vortex DType the exact Arrow type, e.g. is it a String or a LargeString?
What we wanted to add was array.arrow_dtype()
possibly related to the AsArrow compute function, or add an AsArrowDType compute function or something....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think VarBin and VarBinView are the only places where this is ambiguous, and for those you can introspect the encoding details. I think it makes most sense to just implement this as a method on Flattened
...does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will happen more often - timestamp/date/time will be likely different. Similarly for list/fixedsizedlist/largelist/listview
Per our huddle earlier, I've gone ahead and done the following (apologies it's split across a few commits now, if it's too hard to review I can try and stack the PRs more):
|
I would like to rename Flatten to something else, but this PR has gotten big enough already :P |
This is so cool -- great to see this 🚀 Let us know if there is anything we can do to help you along upstream in DataFusion |
Thanks Andrew, and thanks for the amazing work you're doing with the DataFusion project! I think the two big things we're tracking are:
|
❤️
Indeed -- it seem to be coming along very nicely. We are tracking in apache/datafusion#10918
When you say "projection pushdown" is this what you mean: apache/datafusion#2581 ? I just want to make sure the requests is tracked |
Implements a readonly DataFusion
TableProvider
around an array ofDType::Struct
rows. See tests invortex-datafusion/lib.rs
for example of how to read from a SQL context.vortex-datafusion
crate. Types are still private until we're more comfortable making them publicly exposedvortex-dtype
to convert fromDType
to arrowDataType
and arrowSchema
Support for pushing down filters against Vortex arrays will come in a FLUP tomorrow