You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
I have a table sorted by (date, ticker, time) and would like to query data ordered by (date, time) assuming ticker is known to be constant.
It seems that the existing EquivalenceProperties code is built to allow this, but the constant properties are getting lost inside of FilterExec. Furthermore, it seems that FilterExec would propagate the necessary constant property if not for the following check failing:
I noticed that if we just add DataType::Utf8 to the list, it seems to work for the example below (in the next section), so my uneducated mind wonders if we can just add more datatypes to it? 🤔
Describe the solution you'd like
Given the following table:
CREATE EXTERNAL TABLE data (
"date"VARCHAR,
"ticker"VARCHAR,
"time"VARCHAR,
) STORED AS PARQUET
WITH ORDER ("date", "ticker", "time")
LOCATION 'example/';
INSERT INTO data VALUES
('2023', 'A', '0400'),
('2023', 'A', '0401'),
('2023', 'B', '0400');
I would like for the following query to be executed without a sort:
SELECT*FROM data
WHERE ticker ='A'ORDER BY"date", "time";
-- +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-- | plan_type | plan -- |-- +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-- | logical_plan | Sort: data.date ASC NULLS LAST, data.time ASC NULLS LAST -- |-- | | Filter: data.ticker = Utf8("A") -- |-- | | TableScan: data projection=[date, ticker, time], partial_filters=[data.ticker = Utf8("A")] -- |-- | physical_plan | SortPreservingMergeExec: [date@0 ASC NULLS LAST,time@2 ASC NULLS LAST] -- |-- | | SortExec: expr=[date@0 ASC NULLS LAST,time@2 ASC NULLS LAST] -- |-- | | CoalesceBatchesExec: target_batch_size=8192 -- |-- | | FilterExec: ticker@1 = A -- |-- | | RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1 -- |-- | | ParquetExec: file_groups={1 group: [[Users/matthewcramerus/arrow-datafusion/datafusion-cli/example/bGDu3tzuQaHhgpGZ_0.parquet]]}, projection=[date, ticker, time], output_ordering=[date@0 ASC NULLS LAST, ticker@1 ASC NULLS LAST, time@2 ASC NULLS LAST], predicate=ticker@1 = A, pruning_predicate=ticker_min@0 <= A AND A <= ticker_max@1, required_guarantees=[ticker in (A)] |-- | | -- |-- +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Describe alternatives you've considered
I initially made a patch to FilterExec that uses the LiteralGuarantee API to extract constant columns, but then I realized I was doing what the AnalyzsisContext does, just worse and not in a way that respects statistics propagated up from child plans.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Seems like combined with your PR, the results children[0].data would always be false, I'll try do figure it out then I think the issue would be worked out
Current plan is to substitute equivalence in FilterExec and when Calculate equivalence and the filer expression has a EqExpr and left parts are Sorted from its child also the right part is a constant, we add that expression to the constant vec.
Is your feature request related to a problem or challenge?
I have a table sorted by
(date, ticker, time)
and would like to query data ordered by(date, time)
assumingticker
is known to be constant.It seems that the existing
EquivalenceProperties
code is built to allow this, but the constant properties are getting lost inside ofFilterExec
. Furthermore, it seems thatFilterExec
would propagate the necessary constant property if not for the following check failing:https://github.com/apache/arrow-datafusion/blob/3c26e597aeacde0a5e6a51f30394d3d31c6acd96/datafusion/physical-plan/src/filter.rs#L127-L135
which ultimately boils down to this piece of code only allowing numeric types:
https://github.com/apache/arrow-datafusion/blob/3c26e597aeacde0a5e6a51f30394d3d31c6acd96/datafusion/physical-expr/src/intervals/utils.rs#L96-L111
I noticed that if we just add
DataType::Utf8
to the list, it seems to work for the example below (in the next section), so my uneducated mind wonders if we can just add more datatypes to it? 🤔Describe the solution you'd like
Given the following table:
I would like for the following query to be executed without a sort:
Describe alternatives you've considered
I initially made a patch to
FilterExec
that uses theLiteralGuarantee
API to extract constant columns, but then I realized I was doing what theAnalyzsisContext
does, just worse and not in a way that respects statistics propagated up from child plans.Additional context
No response
The text was updated successfully, but these errors were encountered: