-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Window Linear Mode use smaller buffers #9597
Window Linear Mode use smaller buffers #9597
Conversation
…ning # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
…ning # Conflicts: # datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs
@@ -566,6 +567,50 @@ fn get_random_window_frame(rng: &mut StdRng, is_linear: bool) -> WindowFrame { | |||
// should work only with WindowAggExec | |||
window_frame | |||
} | |||
}; | |||
window_frame.start_bound = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes introduced to be able to test BoundedWindowAggExec
with current row
bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this carefully and LGTM
There seems to be an issue with how we print plans making CI fail. I will let @mustafasrepo fix it and then merge |
Which issue does this PR close?
Closes #.
Rationale for this change
Background: In
Linear
mode of theBoundedWindowAggExec
none of the partition by expressions are ordered. In these cases, we can generate result for a partition as long as a new row with same partition is received. Otherwise, result cannot be generated for the partition. As an example consider the table,Assume following query is executed on this table
datafusion will generate following plan
where
BoundedWindowAggExec
is inLinear
mode. Partition withhash=2
receives following section from the input tablequery above can generate following result for the section above
Since in query we have
range between unbounded preceding and 1 following
forsn=2
we cannot generate result untilsn=4
is received (where it is guaranteed end range for thesn=3
). Same thing applies to other partitions with different hash values. However, from the input data, it can be seen that for the partition:hash=2
possible future rows cannot havesn=3
,sn=4
, etc. (where most recent data the input issn=9
). If we can use this information, we can generate early results for different partitions. Also this enables us to use less memory. With this information we can generate the followingresult
instead of current behaviour with result
This enables to use less memory when cardinality is high for partition by expressions, and window frame query is either
RANGE
orGROUPS
query (ForROWS
queries we need the new row that belong to same partition anyway.).What changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?