-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(substrait): add support for insert roundtrip in append mode #13118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of this PR, I assumed that OutputMode.Unspecified replicates this behaviour during an insert.
It looks like the spec indicates that if the Output Mode is not specified it behaves as NO_OUTPUT so this assumption doesn't match the spec. See https://substrait.io/relations/logical_relations/#write-operator
I'll open an issue on the substrait side for this. We might need a new OutputMode named OperationStats or something similar.
I think this is something we would be open to in the upstream if you wanted to propose it.
@@ -255,6 +257,27 @@ async fn except_rels( | |||
Ok(rel) | |||
} | |||
|
|||
fn from_substrait_names(names: &[String]) -> Result<TableReference> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I think it would be good to give this more specific name like table_reference_from_substrait_names
_ => { | ||
return not_impl_err!("Unsupported WriteOp: {:?}", write.op()); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor/nit: if we move this up above table_schema
we can avoid doing redundant work in the event that his validation fails.
@vbarua thanks for the tip. What sentence are you referring to? What would be the point of having both UNSPECIFIED and NO_OUTPUT in that case? I sort of thought Unspecified meant "we don't care, plan doesn't depend on it".
Let me see how common this behavior is in other DBs first and I'll come up with something. |
My understanding is that the use of UNSPECIFIED as the first value in the Substrait enums comes from this Protobuf developer recommendation https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum It's not intended to mean
though I can understand why it would seem like that. In Substrait, UNSPECIFIED effectively means unset by the producer and the spec tries to define default behaviours in those cases. I'm interpreting the line
from substrait.io/relations/logical_relations#write-operator as indicated that the default behaviour is to not return any output, but tbh that could/should be made clearer if that's the case. |
I think we should add this to the upstream first, otherwise we'll end up with non-compliant behaviour in DataFusion which folks might try and use. |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Rationale for this change
Adds support for append-mode inserts for substrait producer and consumer.
What changes are included in this PR?
The trickiest part is to get output schema right as substrait and datafusion don't fit very well there. Datafusion returns a single field named
count
after an insert. substrait has no explicit OutputMode that does the same. For the sake of this PR, I assumed thatOutputMode.Unspecified
replicates this behavior during an insert. I'll open an issue on the substrait side for this. We might need a new OutputMode named OperationStats or something similar.Are these changes tested?
yes
Are there any user-facing changes?
yes