Add `LogicalPlanStats` to logical plan nodes #13618

peter-toth · 2024-12-01T21:39:41Z

Which issue does this PR close?

This a proof of concept PR to improve performance of TreeNode traversals. The main purpose of this PR is to demonstrate a promising but API breaking optimization.

Rationale for this change

TreeNode traversal APIs are crucial parts of query plan analysis and optimzation. This PR explores the idea of storing some pre-calculated statistics/properties of nodes (or subtrees) inside the nodes during creation and automatically update the values during transfomations.

This PR focuses on logical plan building blocks (LogicalPlan, Expr) and only one particular optimization that stores a bitset pattern in each node to describe the subtree content, but additional attributes/properties can be added in follow-up PRs like:

The type of expressions to speed up Expr::get_type() (Large OR list overflows the stack #9375 (comment), Proposal: introduced typed expressions, separate AST and IR #12604)
Hash / semantic hash of expressions to improve CSE (feat: support normalized expr in CSE #13315 (comment))

and the idea can be extended to physical trees as well.

What changes are included in this PR?

To store the pre-calculated statistics/properties LogicalPlanStats struct fields are added to each LogicalPlan and Expr nodes:

pub enum LogicalPlan {
    Projection(Projection, LogicalPlanStats),
    Filter(Filter, LogicalPlanStats),
    ...
}

pub enum Expr {
    BinaryExpr(BinaryExpr, LogicalPlanStats),
    Like(Like, LogicalPlanStats),
    ...
}

This might look redundant but most likely this is the least intrusive way to the existing code. Pattern matching against LogicalPlan and Expr just need to add a new param, while new enum constructor methods can be defined as lowercase version of the enum items (if there is no conflict with existing methods). The constructor methods calculate the LogicalPlanStats fields based on main content of the node.

Please note that even this approach requites quite a lot of API breaking changes in the codebase, but all are mechanical. The following table summarizes the before/after code to construct the enum items. (I just used the _ prefix to avoid conflicts, but we can come up with better names.)

Before	After
`Expr::Alias`	`Expr::alias_qualified`
`Expr::Column`	`Expr::column`
`Expr::ScalarVariable`	`Expr::scalar_variable`
`Expr::Literal`	`Expr::literal`
`Expr::BinaryExpr`	`Expr::binary_expr`
`Expr::Like`	`Expr::_like`
`Expr::SimilarTo`	`Expr::similar_to`
`Expr::Not`	`Expr::_not`
`Expr::IsNotNull`	`Expr::_is_not_null`
`Expr::IsNull`	`Expr::_is_null`
`Expr::IsTrue`	`Expr::_is_true`
`Expr::IsFalse`	`Expr::_is_false`
`Expr::IsUnknown`	`Expr::_is_unknown`
`Expr::IsNotTrue`	`Expr::_is_not_true`
`Expr::IsNotFalse`	`Expr::_is_not_false`
`Expr::IsNotUnknown`	`Expr::_is_not_unknown`
`Expr::Negative`	`Expr::negative`
`Expr::Between`	`Expr::_between`
`Expr::Case`	`Expr::case`
`Expr::Cast`	`Expr::cast`
`Expr::TryCast`	`Expr:: try_cast
`Expr::ScalarFunction`	`Expr::scalar_function`
`Expr::AggregateFunction`	`Expr::aggregate_function`
`Expr::WindowFunction`	`Expr::window_function`
`Expr::InList`	`Expr::_in_list`
`Expr::Exists`	`Expr::exists`
`Expr::InSubquery`	`Expr::in_subquery`
`Expr::ScalarSubquery`	`Expr::scalar_subquery`
`Expr::Wildcard`	`Expr::wildcard`
`Expr::GroupingSet`	`Expr::grouping_set`
`Expr::Placeholder`	`Expr::placeholder`
`Expr::OuterReferenceColumn`	`Expr::outer_reference_column`
`Expr::Unnest`	`Expr::unnest`
`LogicalPlan::Projection`	`LogicalPlan::projection`
`LogicalPlan::Filter`	`LogicalPlan::filter`
`LogicalPlan::Window`	`LogicalPlan::window`
`LogicalPlan::Aggregate`	`LogicalPlan::aggregate`
`LogicalPlan::Sort`	`LogicalPlan::sort`
`LogicalPlan::Join`	`LogicalPlan::join`
`LogicalPlan::Repartition`	`LogicalPlan::repartition`
`LogicalPlan::Union`	`LogicalPlan::union`
`LogicalPlan::TableScan`	`LogicalPlan::table_ccan`
`LogicalPlan::EmptyRelation`	`LogicalPlan::empty_relation`
`LogicalPlan::Subquery`	`LogicalPlan::subquery`
`LogicalPlan::SubqueryAlias`	`LogicalPlan::subquery_alias`
`LogicalPlan::Limit`	`LogicalPlan::limit`
`LogicalPlan::Statement`	`LogicalPlan::statement`
`LogicalPlan::Values`	`LogicalPlan::values`
`LogicalPlan::Explain`	`LogicalPlan::explain`
`LogicalPlan::Analyze`	`LogicalPlan::analyze`
`LogicalPlan::Extension`	`LogicalPlan::extension`
`LogicalPlan::Distinct`	`LogicalPlan::distinct`
`LogicalPlan::Dml`	`LogicalPlan::dml`
`LogicalPlan::Ddl`	`LogicalPlan::ddl`
`LogicalPlan::Copy`	`LogicalPlan::copy`
`LogicalPlan::DescribeTable`	`LogicalPlan::describe_table`
`LogicalPlan::Unnest`	`LogicalPlan::unnest`
`LogicalPlan::RecursiveQuery`	`LogicalPlan::recursive_query`

For the above mentioned pattern based optimization the PR defines a LogicalPlanPattern enum that contains all possible node kinds of a logical plan:

pub enum LogicalPlanPattern {
    // [`Expr`] nodes
    ExprBinaryExpr,
    ExprLike,
    ...

    // [`LogicalPlan`] nodes
    LogicalPlanProjection,
    LogicalPlanFilter,
    ...
}

A bitset of LogicalPlanPattern enum is added to the LogicalPlanStats struct to reflect the content of the node's subtree. The implementation could use any kind of bitset, but https://docs.rs/enumset/latest/enumset/ looks like a good candidate.

pub struct LogicalPlanStats {
    patterns: EnumSet<LogicalPlanPattern>,
}

For example here are a few Expr item constructors:

impl Enum {
    pub fn binary_expr(binary_expr: BinaryExpr) -> Self {
        // A `BinaryExpr` node contains the `ExprBinaryExpr` pattern and the patterns of its children
        let stats = LogicalPlanStats::new(enum_set!(LogicalPlanPattern::ExprBinaryExpr)).merge(binary_expr.stats());
        Expr::BinaryExpr(binary_expr, stats)
    }

    pub fn _like(like: Like) -> Self {
        // A `Like` node contains the `ExprLike` pattern and the patterns of its children
        let stats = LogicalPlanStats::new(enum_set!(LogicalPlanPattern::ExprLike)).merge(like.stats());
        Expr::Like(like, stats)
    }

    ...
}

While maintaining the bitset during tree transformations comes with some costs, with the bitset we can speed up LogicalPlan and Expr traversals significantly. For example if we have a traversal that does something with Expr::BinaryExpr nodes only:

expr.apply(|e| {
    match e {
        Expr::BinaryExpr(..) => // do something
        _ => // do nothing
    }

})

then we can check the presence of Expr::BinaryExpr in a subtree and simply skip traversing subtrees without the LogicalPlanPattern::ExprBinaryExpr pattern:

expr.apply(|e| {
    if !e.stats().contains_pattern(enum_set!(LogicalPlanPattern::ExprBinaryExpr)) {
        return Ok(TreeNodeRecursion::Jump);
    }

    match e {
        Expr::BinaryExpr(..) => // do something
        _ => // do nothing
    }
})

I modified some of the traversal functions in this PR to demonstrate that the optimization brings significant performance improvement to sql_planner:

% critcmp main stats
group                                         main                                    stats
-----                                         ----                                    -----
logical_aggregate_with_join                   1.15  649.7±183.81µs        ? ?/sec     1.00    564.2±6.04µs        ? ?/sec
logical_select_all_from_1000                  1.00      2.8±0.02ms        ? ?/sec     1.00      2.8±0.03ms        ? ?/sec
logical_select_one_from_700                   1.00   408.8±43.00µs        ? ?/sec     1.00   408.9±45.43µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    396.2±6.83µs        ? ?/sec     1.00    396.0±5.13µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   381.7±19.18µs        ? ?/sec     1.00   383.5±30.79µs        ? ?/sec
physical_intersection                         1.53  1034.7±13.33µs        ? ?/sec     1.00   677.8±53.69µs        ? ?/sec
physical_join_consider_sort                   1.49  1459.6±13.75µs        ? ?/sec     1.00   979.1±75.49µs        ? ?/sec
physical_join_distinct                        1.07   378.0±26.75µs        ? ?/sec     1.00   353.8±12.57µs        ? ?/sec
physical_many_self_joins                      1.38     10.4±0.61ms        ? ?/sec     1.00      7.5±0.48ms        ? ?/sec
physical_plan_clickbench_all                  1.15     89.8±0.71ms        ? ?/sec     1.00     78.0±0.69ms        ? ?/sec
physical_plan_clickbench_q1                   1.15  1301.7±210.59µs        ? ?/sec    1.00  1136.1±14.32µs        ? ?/sec
physical_plan_clickbench_q10                  1.15  1680.5±20.50µs        ? ?/sec     1.00  1464.7±121.45µs        ? ?/sec
physical_plan_clickbench_q11                  1.22  1785.3±195.51µs        ? ?/sec    1.00  1458.5±19.58µs        ? ?/sec
physical_plan_clickbench_q12                  1.12  1823.1±151.47µs        ? ?/sec    1.00  1634.2±226.69µs        ? ?/sec
physical_plan_clickbench_q13                  1.16  1655.8±203.04µs        ? ?/sec    1.00  1423.1±159.93µs        ? ?/sec
physical_plan_clickbench_q14                  1.12  1722.6±99.00µs        ? ?/sec     1.00  1532.4±253.66µs        ? ?/sec
physical_plan_clickbench_q15                  1.15  1673.2±16.20µs        ? ?/sec     1.00  1455.6±120.38µs        ? ?/sec
physical_plan_clickbench_q16                  1.13  1442.7±11.38µs        ? ?/sec     1.00  1280.9±31.93µs        ? ?/sec
physical_plan_clickbench_q17                  1.17  1515.6±123.83µs        ? ?/sec    1.00  1296.1±24.30µs        ? ?/sec
physical_plan_clickbench_q18                  1.12  1381.2±148.46µs        ? ?/sec    1.00  1233.7±18.41µs        ? ?/sec
physical_plan_clickbench_q19                  1.10  1689.3±51.93µs        ? ?/sec     1.00  1532.3±140.70µs        ? ?/sec
physical_plan_clickbench_q2                   1.12  1368.6±100.33µs        ? ?/sec    1.00  1222.7±24.16µs        ? ?/sec
physical_plan_clickbench_q20                  1.05  1237.2±14.74µs        ? ?/sec     1.00  1182.7±219.27µs        ? ?/sec
physical_plan_clickbench_q21                  1.12  1353.3±106.07µs        ? ?/sec    1.00  1209.1±13.96µs        ? ?/sec
physical_plan_clickbench_q22                  1.13  1817.7±299.10µs        ? ?/sec    1.00  1610.8±27.71µs        ? ?/sec
physical_plan_clickbench_q23                  1.12      2.0±0.21ms        ? ?/sec     1.00  1797.8±176.74µs        ? ?/sec
physical_plan_clickbench_q24                  1.28      2.5±0.37ms        ? ?/sec     1.00  1960.8±138.84µs        ? ?/sec
physical_plan_clickbench_q25                  1.12  1485.4±24.37µs        ? ?/sec     1.00  1324.8±124.35µs        ? ?/sec
physical_plan_clickbench_q26                  1.14  1373.9±21.27µs        ? ?/sec     1.00  1203.5±14.13µs        ? ?/sec
physical_plan_clickbench_q27                  1.19  1556.4±179.85µs        ? ?/sec    1.00  1312.8±16.84µs        ? ?/sec
physical_plan_clickbench_q28                  1.11  1787.1±108.99µs        ? ?/sec    1.00  1602.9±148.44µs        ? ?/sec
physical_plan_clickbench_q29                  1.13      2.3±0.02ms        ? ?/sec     1.00  1989.6±135.85µs        ? ?/sec
physical_plan_clickbench_q3                   1.08  1346.0±69.08µs        ? ?/sec     1.00  1246.4±140.79µs        ? ?/sec
physical_plan_clickbench_q30                  1.31      8.6±0.08ms        ? ?/sec     1.00      6.5±0.08ms        ? ?/sec
physical_plan_clickbench_q31                  1.12  1868.6±251.29µs        ? ?/sec    1.00  1662.7±137.91µs        ? ?/sec
physical_plan_clickbench_q32                  1.07  1846.9±118.97µs        ? ?/sec    1.00  1725.8±222.02µs        ? ?/sec
physical_plan_clickbench_q33                  1.09  1630.4±72.61µs        ? ?/sec     1.00  1490.4±147.75µs        ? ?/sec
physical_plan_clickbench_q34                  1.06  1434.8±17.97µs        ? ?/sec     1.00  1348.2±111.82µs        ? ?/sec
physical_plan_clickbench_q35                  1.09  1536.3±169.30µs        ? ?/sec    1.00  1412.6±227.81µs        ? ?/sec
physical_plan_clickbench_q36                  1.21      2.2±0.26ms        ? ?/sec     1.00  1814.2±50.46µs        ? ?/sec
physical_plan_clickbench_q37                  1.19      2.2±0.02ms        ? ?/sec     1.00  1813.6±17.51µs        ? ?/sec
physical_plan_clickbench_q38                  1.20      2.2±0.02ms        ? ?/sec     1.00  1806.1±19.72µs        ? ?/sec
physical_plan_clickbench_q39                  1.00  1920.2±123.71µs        ? ?/sec    1.02  1952.7±496.68µs        ? ?/sec
physical_plan_clickbench_q4                   1.06  1249.8±13.81µs        ? ?/sec     1.00  1174.0±129.47µs        ? ?/sec
physical_plan_clickbench_q40                  1.07      2.1±0.02ms        ? ?/sec     1.00  1964.5±179.40µs        ? ?/sec
physical_plan_clickbench_q41                  1.08      2.0±0.03ms        ? ?/sec     1.00  1897.2±360.88µs        ? ?/sec
physical_plan_clickbench_q42                  1.12  1983.9±163.89µs        ? ?/sec    1.00  1765.5±124.55µs        ? ?/sec
physical_plan_clickbench_q43                  1.13      2.0±0.25ms        ? ?/sec     1.00  1789.7±137.35µs        ? ?/sec
physical_plan_clickbench_q44                  1.09  1320.8±117.18µs        ? ?/sec    1.00  1206.4±12.62µs        ? ?/sec
physical_plan_clickbench_q45                  1.09  1312.0±97.44µs        ? ?/sec     1.00  1201.4±32.21µs        ? ?/sec
physical_plan_clickbench_q46                  1.04  1536.8±16.35µs        ? ?/sec     1.00  1471.9±319.91µs        ? ?/sec
physical_plan_clickbench_q47                  1.11  1785.4±14.40µs        ? ?/sec     1.00  1612.5±269.34µs        ? ?/sec
physical_plan_clickbench_q48                  1.17      2.1±0.16ms        ? ?/sec     1.00  1763.0±114.77µs        ? ?/sec
physical_plan_clickbench_q49                  1.16      2.1±0.10ms        ? ?/sec     1.00  1841.9±53.85µs        ? ?/sec
physical_plan_clickbench_q5                   1.10  1325.1±13.65µs        ? ?/sec     1.00  1199.8±13.83µs        ? ?/sec
physical_plan_clickbench_q6                   1.14  1371.1±87.24µs        ? ?/sec     1.00  1201.4±12.84µs        ? ?/sec
physical_plan_clickbench_q7                   1.11  1668.0±202.24µs        ? ?/sec    1.00  1505.9±215.37µs        ? ?/sec
physical_plan_clickbench_q8                   1.07  1458.5±106.34µs        ? ?/sec    1.00  1360.3±268.51µs        ? ?/sec
physical_plan_clickbench_q9                   1.11  1566.6±11.70µs        ? ?/sec     1.00  1412.1±108.23µs        ? ?/sec
physical_plan_tpcds_all                       1.35    655.1±2.04ms        ? ?/sec     1.00   484.8±11.59ms        ? ?/sec
physical_plan_tpch_all                        1.34     38.6±0.30ms        ? ?/sec     1.00     28.7±0.36ms        ? ?/sec
physical_plan_tpch_q1                         1.36  1217.8±49.60µs        ? ?/sec     1.00    897.0±5.98µs        ? ?/sec
physical_plan_tpch_q10                        1.28  1695.3±11.60µs        ? ?/sec     1.00  1324.0±12.04µs        ? ?/sec
physical_plan_tpch_q11                        1.21  1520.2±13.43µs        ? ?/sec     1.00  1255.0±277.75µs        ? ?/sec
physical_plan_tpch_q12                        1.48  1343.2±245.97µs        ? ?/sec    1.00   910.5±59.66µs        ? ?/sec
physical_plan_tpch_q13                        1.24   836.6±44.52µs        ? ?/sec     1.00   674.1±47.79µs        ? ?/sec
physical_plan_tpch_q14                        1.36  1025.8±11.05µs        ? ?/sec     1.00   754.6±12.43µs        ? ?/sec
physical_plan_tpch_q16                        1.35  1520.9±13.61µs        ? ?/sec     1.00  1129.8±24.25µs        ? ?/sec
physical_plan_tpch_q17                        1.37  1490.4±169.95µs        ? ?/sec    1.00  1090.4±101.67µs        ? ?/sec
physical_plan_tpch_q18                        1.34  1604.3±109.68µs        ? ?/sec    1.00  1198.2±47.00µs        ? ?/sec
physical_plan_tpch_q19                        1.65      3.0±0.03ms        ? ?/sec     1.00  1826.9±11.52µs        ? ?/sec
physical_plan_tpch_q2                         1.38      3.2±0.04ms        ? ?/sec     1.00      2.4±0.02ms        ? ?/sec
physical_plan_tpch_q20                        1.29  1970.1±19.51µs        ? ?/sec     1.00  1523.3±164.47µs        ? ?/sec
physical_plan_tpch_q21                        1.45      2.7±0.24ms        ? ?/sec     1.00  1869.4±112.54µs        ? ?/sec
physical_plan_tpch_q22                        1.31  1384.3±28.58µs        ? ?/sec     1.00  1056.8±20.60µs        ? ?/sec
physical_plan_tpch_q3                         1.35  1277.6±198.82µs        ? ?/sec    1.00   945.0±66.16µs        ? ?/sec
physical_plan_tpch_q4                         1.24   946.5±49.43µs        ? ?/sec     1.00  765.5±153.06µs        ? ?/sec
physical_plan_tpch_q5                         1.37  1805.6±138.24µs        ? ?/sec    1.00  1321.7±11.72µs        ? ?/sec
physical_plan_tpch_q6                         1.36    655.3±8.70µs        ? ?/sec     1.00    482.4±6.81µs        ? ?/sec
physical_plan_tpch_q7                         1.36      2.4±0.04ms        ? ?/sec     1.00  1778.4±126.30µs        ? ?/sec
physical_plan_tpch_q8                         1.41      2.9±0.10ms        ? ?/sec     1.00      2.1±0.02ms        ? ?/sec
physical_plan_tpch_q9                         1.33      2.2±0.17ms        ? ?/sec     1.00  1654.2±444.38µs        ? ?/sec
physical_select_aggregates_from_200           1.17     15.2±0.59ms        ? ?/sec     1.00     13.0±0.68ms        ? ?/sec
physical_select_all_from_1000                 1.09     30.1±0.47ms        ? ?/sec     1.00     27.7±0.38ms        ? ?/sec
physical_select_one_from_700                  1.61  1597.4±109.43µs        ? ?/sec    1.00   991.4±63.84µs        ? ?/sec
physical_theta_join_consider_sort             1.60  1776.6±333.62µs        ? ?/sec    1.00  1111.7±10.90µs        ? ?/sec
physical_unnest_to_join                       1.59  1503.9±34.19µs        ? ?/sec     1.00   946.3±66.67µs        ? ?/sec
with_param_values_many_columns                20.70    84.5±0.64µs        ? ?/sec     1.00      4.1±0.05µs        ? ?/sec

Most likely further improvements could be achievable by using the new patterns in more traversals, but let's leave it to follow-up PRs.

To sum up this PR contains 3 initial commits:

The 1st commit is not closely related to the main puspose of the PR. It just refactors Expr::Wildcard to incorporate its fields to a named Wildcard struct. This makes Expr::Wildcard similar to other Expr items.
The 2nd commit adds LogicalPlanStats container to all LogicalPlan and Expr enum items. LogicalPlanStats is just an empty struct in this commit and all the changes are mechanical.
The 3rd commit implements plan pattern optimization in LogicalPlanStats and adjusts some of the logical plan traversals to utilize it.

Are these changes tested?

Yes, with existing tests.

Are there any user-facing changes?

No.

peter-toth · 2024-12-01T21:46:04Z

cc @alamb , @berkaysynnada , @findepi

I haven't created any issue for this change yet, but will do if we think it is worth doing such a huge breaking change.
@alamb , I wonder if you cound confirm the benchmark results as the above values are from my local run only.

findepi · 2024-12-02T10:26:39Z

TreeNode traversal APIs are crucial parts of query plan analysis and optimzation. This PR explores the idea of storing some pre-calculatd statistics/properties nodes (or subtrees) inside the nodes during creation and automatically update the stats during transfomations.

I love the idea in principle (vide #12604)

There are however some important questions

does the party creating given plan component know all the values we could want to eventually infer from the plan?
- statistics is a great example. statistics retrieval for a table scan can be expensive. statistics inference for a filter or join is actually complicated. the algorithm for doing that is not part of the plan, and the call site creating the plan shouldn't need to know what the algorithm is
- lazy vs eager. statistics is a great example again. In data lake systems like Iceberg or Delta, statistics retrieval can be expensive process (eg in Iceberg case may involve traversing manifest files). for trivial SELECT .. FROM t [LIMIT n] queries it is not needed (no logic is stats dependent). for complex queries, it should be run after predicate pushdown, thus allowing for cheaper (and more accurae) stats retrieval.
- decoupled. the plan creating and attributes derivation should not be coupled, since they are two different parts of the system.
backwards compatibility. the LogicalPlan nodes are datafusion's frontend (SQL frontend, dataframe's, or created separate by downstream projects). breaking this layer can have dire consequences. proving a way to easily construct LogicalPlan nodes without attributes needed later on could turn the effort void.

Those observations lead to conclusions on the design what should and shouldn't be part of a plan (logical or physical):

there are types of additional information that could (or maybe even should) be added to the plan (Proposal: introduced typed expressions, separate AST and IR #12604).
there are types of additional information which feel more attributes of the optimizer itself -- from design perspective they should better be kept off the plan
more advanced optimizers may want to derive some additional information about the plan node, or group of plan nodes separately from plan creation (for example if an optimizer creates alternative LPs that compute same relation, they stats for that relation don't need to be computed more than once, even if we don't know the stats for intermediate nodes)
value-based immutable data structures is a really really good thing for engineering sanity. in short-term it prevents adding information to the plan nodes during optimization process
- but a more advanced optimizer could want to break plan immutability anyway, to support "plan groups" / alternative plans for given sub-relation, or simply to be able to move a sub-part of the plan into an optimizer rule, without need to destructure the whole plan

To me, the best way to address these various needs would be

creation of a new IR plan structure
- with new expressions to support types Proposal: introduced typed expressions, separate AST and IR #12604
- with much less verbose expression language (like we don't need IS NULL vs IS UNKOWN)
- with clear separation of sorting ()Remove Expr clones from SortExprs #13258), aliases (Remove Alias from Expr #1468), aggregate vs scalar functions, etc
- remodelling aliases would also solve Support columns having the same alias #6543 / Query fails when using select *, followed/prefixed with explicit projections #13476 (see discussion in Support duplicate column aliases in queries #13489)
- this feels key block of Remove Expr clones from SortExprs #13258
creation of a new optimizer which
- would optimize plans favoring optimization locallity (eg if one optimizer rule changed something around plan leaf node, the next optimizer rule should start at that place, rather than from the plan root, cause the rest of the plan was already optimized)
- would allow tracking additional information about plan nodes or plan groups (Trino's Memo is a good analogy here, https://github.com/trinodb/trino/blob/f83854687229df1ae3325249e7bcd29f03d0943c/core/trino-main/src/main/java/io/trino/sql/planner/iterative/Memo.java#L247-L265)

peter-toth · 2024-12-04T10:39:02Z

Yeah, I see your point. I think this all comes down to if we want to / have the resources to implement a new IR plan structure and refactor the existing analyzer / optimizer rules. (And it can be also a breaking change to projects that have their own rules...) Or, we want to / can adjust the existing logical plan to incorporate the above ideas without much API breaking changes and so keep the existing optimizer and the exsisting logical to physical plan conversion.

findepi · 2024-12-04T15:09:17Z

The LogicalPlan loose typing, overly rich Expr syntax and close ties to the syntax tree are an ongoing maintenance tax we continue to pay with every new change, every new feature, so I do hope we accumulate enough decidedness to commence to a new IR. I personally would be very interested in working on this, but this also requires buy-in from other project maintainers.

alamb · 2024-12-05T03:56:51Z

I modified some of the traversal functions in this PR to demonstrate that the optimization brings significant performance improvement to sql_planner:

This is quite cool 🤯

Yeah, I see your point. I think this all comes down to if we want to / have the resources to implement a new IR plan structure and refactor the existing analyzer / optimizer rules. (And it can be also a breaking change to projects that have their own rules...) Or, we want to / can adjust the existing logical plan to incorporate the above ideas without much API breaking changes and so keep the existing optimizer and the exsisting logical to physical plan conversion.

I don't know if we have the resources to do this. I know I don't have the bandwidth to help drive it forward but there are now quite a few other high bandwidth maintainers who might be able to do so.

My personal focus for the next few months is likely on making DataFusion more stable for existing systems, which is likely not exacly aligned with making major changes. However I think internal refactoring is possible (we did it with function representation --> all to udfs) it just needs sustained coding and organizational effort

alamb · 2024-12-05T03:58:01Z

The pre-computed bitset to know what is in each subtree is a very neat idea

alamb · 2025-01-16T22:19:50Z

I am cleaning up the review queue and this PR has a pile of conflicts. I think it is a neat POC type approach, but I don't think we are planning to proceed as is -- though please correct me if I got that wrong

peter-toth · 2025-01-17T09:04:05Z

Hi @alamb, sure we can close this PR.

If I get the above comments right, there will be a new IR plan structure and the current optimizer rules will be rewritten to work on that. Once DataFusion reached that point we can revisit this PR to improve traversal speed of optimizer rules.

However, the idea in this PR is not specific to optimizer rules. It can improve analyzer rules (I think we want to keep them using LogicalPlans, do we?) and physical plan rules and actually any kinds of TreeNode trees as well. So I feel it is orthogonal to having a new IR plan structure...

The problem with LogicalPlan/Expr is that it is defined as an enum, so it is not easy to add derived stats about the tree structure into the nodes without breaking compatibility... Maybe I should open a PR for ExecutionPlan/PhysicalExpr, where this is not an issue, first?

alamb · 2025-01-18T23:02:57Z

The problem with LogicalPlan/Expr is that it is defined as an enum, so it is not easy to add derived stats about the tree structure into the nodes without breaking compatibility... Maybe I should open a PR for ExecutionPlan/PhysicalExpr, where this is not an issue, first?

Maybe that would make more sense. I just feel like our iteration code has gotten pretty intense lately (and it is already getting pretty hard to understand). I understand the need for optimization, but I think we also need to balance it with maintainability / readability

peter-toth · 2025-01-19T08:34:08Z

Atually I just realized after reading @jayzhan211's comment that PlanProperties already exists for physical plan nodes. It's very similar to LogicalPlanStats that I wanted to add to logical plan nodes in this PR. It caches additional, expensive to compute properties of nodes. It would be nice to have such a properties container in both logical and physical plan trees...

jayzhan211 · 2025-01-19T08:44:06Z

I didn't notice this change, it looks quite similar to what I proposed. What is the current challenge faced to? It seems like HUGE breaking change?

peter-toth · 2025-01-19T08:53:52Z

Yeah, adding a new field to LogicalPlans and Exprs and creating new node constructors (e.g. LogicalPlan::Projection -> LogicalPlan::projection) require a lots of API breaking changes (the 2nd commit of this PR), although the changes are mechanical...

jayzhan211 · 2025-01-19T09:13:27Z

I quickly scan the change. I think we don't require to modify Expr itself, related statistics are computed and store in LogicalPlanProperties

pub struct Projection {
    /// The list of expressions
    pub expr: Vec<Expr>,
    /// The incoming logical plan
    pub input: Arc<LogicalPlan>,
    /// The schema description of the output
    pub schema: DFSchemaRef,

    // Any kind of computed information place here
    properties: LogicalPlanProperties
}

I guess this reduces about half the change.

And given the approach in the above, we can also change the plan one by one (I guess), so we can iterate the change in a smaller and easy review one.

Some plans such as DmlStatement or Explain probably don't need any such properties or statistics, only plan that has Expr might need it.

peter-toth · 2025-01-19T09:56:27Z

Yeah, most likely we don't need LogicalPlanProperties in all nodes for this particular optimization, but in the future we might need it so I added it to all nodes to make the whole PR consistent.

The property (patterns: EnumSet<LogicalPlanPattern>) that I added in the 3rd commit is an enumset about the whole logical tree strucutre so LogicalPlanPattern contains both LogicalPlan and Expr elements.
Expr needs this property to propagate information from nested LogicalPlans and it is also needed for a few traversal optimizations on expressions.
DmlStatement and Explain have it because they have inputs. But yeah EmptyRelation has no input and no expression so it is actually not needed there...

peter-toth added 3 commits November 30, 2024 15:31

Wildcard fix

388fd24

Add LogicalPlanStats

4b3b35b

Use LogicalPlanStats

2e1af6e

alamb marked this pull request as draft January 16, 2025 22:19

xudong963 self-requested a review January 22, 2025 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `LogicalPlanStats` to logical plan nodes #13618

Add `LogicalPlanStats` to logical plan nodes #13618

peter-toth commented Dec 1, 2024 •

edited

Loading

peter-toth commented Dec 1, 2024

findepi commented Dec 2, 2024

peter-toth commented Dec 4, 2024 •

edited

Loading

findepi commented Dec 4, 2024

alamb commented Dec 5, 2024

alamb commented Dec 5, 2024

alamb commented Jan 16, 2025

peter-toth commented Jan 17, 2025

alamb commented Jan 18, 2025

peter-toth commented Jan 19, 2025

jayzhan211 commented Jan 19, 2025

peter-toth commented Jan 19, 2025 •

edited

Loading

jayzhan211 commented Jan 19, 2025

peter-toth commented Jan 19, 2025

Add LogicalPlanStats to logical plan nodes #13618

Are you sure you want to change the base?

Add LogicalPlanStats to logical plan nodes #13618

Conversation

peter-toth commented Dec 1, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

peter-toth commented Dec 1, 2024

findepi commented Dec 2, 2024

peter-toth commented Dec 4, 2024 • edited Loading

findepi commented Dec 4, 2024

alamb commented Dec 5, 2024

alamb commented Dec 5, 2024

alamb commented Jan 16, 2025

peter-toth commented Jan 17, 2025

alamb commented Jan 18, 2025

peter-toth commented Jan 19, 2025

jayzhan211 commented Jan 19, 2025

peter-toth commented Jan 19, 2025 • edited Loading

jayzhan211 commented Jan 19, 2025

peter-toth commented Jan 19, 2025

Add `LogicalPlanStats` to logical plan nodes #13618

Add `LogicalPlanStats` to logical plan nodes #13618

peter-toth commented Dec 1, 2024 •

edited

Loading

peter-toth commented Dec 4, 2024 •

edited

Loading

peter-toth commented Jan 19, 2025 •

edited

Loading