-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support User Defined Table Functions / Table Value Functions #7926
Comments
I'm interested by this topic! Seems like a very powerful feature! : ) And I want to study the related codes and figure it out to learn more about Now it looks a little difficult for me, but maybe I can help in future weeks if I can handle it. |
Yes, I think so
Thank you! It is very much appreciated.
Thank you. My suggestion is to first make a RFC / proposal PR (for example like #8046) that sketches out the main APIs and identifies any potential problems / areas that need additional work prior to trying implement a fully mergeable PR. This would allow us to solidify the design without wasted effort on polishing code / tests |
Before User Defined Table Functions, I can not find Internal Table Function either. So I suppose that maybe we can start by implmenting the Internal Table Functions first, just like SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']); how do you think @alamb : ) Then we can let users define their own Table Functions. |
I think implementing something like |
Agreed, that sounds like a feasible approach.
FYI. Should table-valued function inputs allow variables or only constants (like 'literal value' or 'now()')?
I prefer this approach because table-valued functions resemble a table more and the processing logic is also more similar. |
It depends on what you mean by 'variables'. I think it would be very reasonable to accept However, I think it would be much harder to accept things like |
Sorry for not being clear earlier, what I meant is exactly how you understood it.
|
That is interesting -- I am not sure how we would make this work - maybe sketching out what the plan would look like would help. It seems like it would effectively have to make a |
I'm going to make a draft PR this week : ) |
This is a very useful feature. Really looking forward it can be landed next release. |
A valuable specialization for table-valued function is being able to use them as de-aggregating expressions, if the output is single-column. An example of this is, for example, the
The second form takes multiple arrays as arguments and returns records. Therefore, this form can only be used in table-factor position
|
Follow on work is tracked in #8383 |
Is your feature request related to a problem or challenge?
It is sometimes helpful to have a custom table functions to extend DataFusion's functionality. As we continue to get more feature requests, such as #7859 it is important to support such usecases without having to add everything to the DataFusion core.
For example in the following query
my_custom_fun
is a table functionA specific example might be a function that fetches the contents of remote csv file and parses it into a table.
You can do something similar to this with a TableProvider, but the main differences are:
TableProvider
has no way to pass parametersTableProvider
's schema is fixed (it can't be a function of the parameters)Prior Art
Other examples include the
read_parquet
etc functions in DuckDBDescribe the solution you'd like
I would like to be able to have a table function that supported everything a
TableProvider
does, including filter and projection pushdown. One way to do so would be to actually return aTableProvider
:Option 1: Add to
FunctionRegistry
:We could add Table Functions to the datafusion::execution::FunctionRegistry along with the UDFs, UDAs, etc which arguably would make them easier to discover
Something like
We would probably also need a
Describe alternatives you've considered
This API is very powerful and would allow Table Functions to do anything a table provider does. We could also offer a stripped down version of the API potentially
We can probably add something like datafusion::logical_expr::create_udf to make it easier to construct basic table functions (e.g that produce a single
SendableRecordBatchStream
)Add to
SchemaProvider
:We could also add Table Functions to datafusion::catalog::schema::SchemaProvider
This might make sense given how similar
TableFunction
s are toTableProvider
sAdditional context
I thought there was an existing ticket for this, but I can not find one
This came up several times, including:
The text was updated successfully, but these errors were encountered: