Cherry pick "ORCA support ext stats, Fix EPQ..." #855

jiaqizho · 2025-01-09T09:30:14Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

avamingli · 2025-01-16T03:47:40Z

Squash fix commits to one?

jiaqizho · 2025-01-16T05:52:59Z

Squash fix commits to one?

My idea is to split relatively independent commits as much as possible.

Commit 1: Used to fix the coredump caused by partdesc
Commit 2: Handle the parts that ext stats does not support
Commit 3: Fix the changes of icw.

I think splitting the commits is helpful for problem location(backtracking) and better readability for single commit ? :)

The base branch was changed.

Add a GUC `optimizer_discard_redistribute_hashjoin` to control redistribute hashjoin. If any one of HashJoin's children operators is RedistributeMotion, this plan alternative will be given MAX cost, hence it will not be picked by optimizer. Default value is `OFF`.

Note that, commit d0f8614 enable ORCA to produce plan for foreign table query, but for locking clause on the foreign table, ORCA cann't produce correct remote SQL, which will lose locking clause. For example, for the query "SELECT * FROM ft1 FOR UPDATE", the ORCA will produce the following plan: postgres=# explain (verbose) SELECT * FROM ft1 where c1 in (select c1 from ft2 for share) for update; QUERY PLAN ------------------------------------------------------------------------------------------ Hash Semi Join (cost=0.00..862.33 rows=1 width=47) Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8 Hash Cond: (ft1.c1 = ft2.c1) -> Foreign Scan on public.ft1 (cost=0.00..431.04 rows=1000 width=47) Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" -> Hash (cost=431.00..431.00 rows=1 width=4) Output: ft2.c1 -> Foreign Scan on public.ft2 (cost=0.00..431.00 rows=1 width=4) Output: ft2.c1 Remote SQL: SELECT "C 1" FROM "S 1"."T 1" Optimizer: Pivotal Optimizer (GPORCA) (12 rows) while postgres optimizer will output the plan as follows: QUERY PLAN ----------------------------------------------------------------------------------------------------------- Hash Join (cost=679.55..726.30 rows=1000 width=146) Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, "ANY_subquery".* Inner Unique: true Hash Cond: (ft1.c1 = "ANY_subquery".c1) -> Foreign Scan on public.ft1 (cost=100.00..133.00 rows=1000 width=118) Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE -> Hash (cost=567.05..567.05 rows=1000 width=32) Output: "ANY_subquery".*, "ANY_subquery".c1 -> HashAggregate (cost=563.72..567.05 rows=1000 width=32) Output: "ANY_subquery".*, "ANY_subquery".c1 Group Key: "ANY_subquery".c1 -> Subquery Scan on "ANY_subquery" (cost=100.00..561.22 rows=1000 width=32) Output: "ANY_subquery".*, "ANY_subquery".c1 -> Foreign Scan on public.ft2 (cost=100.00..551.22 rows=1000 width=47) Output: ft2.c1, ft2.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE Optimizer: Postgres query optimizer (18 rows) We can see the remote SQL of the two plan, the ORCA's is wrong obviously. Since it lost the locking clause. After going through the related code of ORCA, I found that support this locking clause on foreign table is not that easy. We have to populate the RowMarkCluase to PlannedStmt. Even for local table, ORCA didn't handle this, it just add an Exclusive lock on the corresponding table. So in this commit, we just fall back to postgres query optimizer, when there is a locking clause on foreign table.

For gpdb7 will only support python3 and we also update the gpMgmt/requirements-dev.txt to remove the stick the modules to the python2 compatible versions introduced by the commit 68a8237. [GPR-1153] Co-authored-by: Anusha Shakarad <[email protected]> Co-authored-by: Ning Wu <[email protected]>

because: 1) ORCA (gpopt, gporca) - All bitcodes are used by LLVM JIT inlining optimization step, which occurs in executor stage. ORCA is only accessed in plan stage, so there is no need to build bitcodes; 2) gpcloud - C++ is more complex and related LLVM API seems not so stable. The upstream code also DO NOT have C++ module, making C++ bitcode new to us. Considering the rare usage of gpcloud and the risk, it would be better to disable before we have a full verfication and more C++ modules used in executor stage. 3) And a side effect - no workaround needed for emiting bitcode of C++ modules, even for CLANG/LLVM version < 9. The config/llvm.m4 keeps synced to supported LLVM versions. We also avoid updating the build system to solving CXX standard collision between ORCA requirement and the one CLANG provides by default. To disable bitcodes for the above parts, `with_llvm = no` is set before including src/backend/common.mk (for gporca/gpopt) and pgxs.mk (for gpcloud) In addition, gporca/gpopt are filered out in install-postgres-bitcode.

Fixme: src/backend/gpopt/translate/CPartPruneStepsBuilder.cpp: // GPDB_12_MERGE_FIXME: This should be StrategyNumber, but IndexOpProperties // takes an INT Action: Both `StrategyNumber opstrategy` and IndexOpProperties argument `INT strategy` are upstream definitions. It's better to hide StrategyNumber (16 bit) to INT (32 bit) conversion in gpdb wrapper.

…778) This commit is an extension of commit 8467aa6b which added ORCA support to use functional dependency extended stats for estimating correlated cardinality. Here we add support for STATS_EXT_NDISTINCT. After this commit ORCA is now able to produce better cardinality estimates for GROUP BY or DISTINCT on correlated columns. Following is a basic example: ``` CREATE TABLE t (a INT, b INT); INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i); CREATE STATISTICS stts (ndistinct) ON a, b FROM t; ANALYZE t; EXPLAIN SELECT COUNT(*) FROM t GROUP BY a, b; ```

This adds a guc optimizer_enable_foreign_table. If enabled, Orca will generate plans involving foreign tables. Otherwise, we will fall back to planner. By default this guc is enabled.

Previously, Orca would crash if encountering this. For now, fall back similarly to what we do for RTE_TABLEFUNCTION. Example query: explain SELECT * FROM xmltable('/root' passing '' COLUMNS element text); This actually fell back correctly in retail build, but we were hitting a `__builtin_unreachable` call, which is undefined behavior, so we were likely just getting lucky. Fixes issue reported in https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/4F8GUSs04L8

This is unused code that was used with the old partitioning architecture in Orca. Since we now do static elimination within Orca, this is no longer needed. Also resolves a fixme

This is already handled within property enforcement. We don't need to recheck here.

This was used in the previous partitioning architecture with partial indexes, but is now unused. Also resolves a FIXME.

When creating a table descriptor in Orca for a CTAS query, the created table should have a RowExclusiveLock as planner does in createas.c.

Mark this as GPDB_12_MERGE_FEATURE_NOT_SUPPORTED, as this is more an improvement than a bug (maintains existing behavior as 6X). It needs a fair amount of thinking as this can suffer from the same issue of correlated stats in join predicates.

This mdp test apparently had a join order change, but the mdp no longer runs and there's no DDL/query to repro this, so there's not anything to work with here. Remove the fixme.

Previously, we had commented out the test for "TPCH-Q5.mdp" since it was hitting an assertion during constraint derivation. The main culprit here is that `IsLessThan` returned "false" when we didn't use the internal evaluator nor the external evaluator, which is the case for mdps. Usually we don't create mdps of this nature. However, we hit this case because we're comparing a timestamp and a date type. By themselves they're both comparable and can even use the internal evaluator. However, date types use LINT, whereas timestamp use DOUBLE. This led to invalid comparisons, but is only an issue for mdps. This is now more obvious with an assertion if can_use_external_evaluator is false. Additionally, remove a couple mdps that now hit this assertion and replace it with an ICW test.

…ome cases (#14835) If the inner child is Result or Filter node, Orca may generate a plan that does not materialize the nestloop join's inner child, which will case bad performance. This patch has fixed the issue by forcing materialize the inner child.

…105) On GPDB6, the GUC gp_enable_sort_distinct is used to optimize sort by indicating "noduplicate" to remove duplicated rows and reduce data handling. It effects on replacement selection sort only. However, on GPDB7, replacement selection sort has been removed completely. So we need to remove gp_enable_sort_distinct and other codes related to "noduplicate" sort.

@DevChattopadhyay

…rence (#14905) This patch reinstates the logic before commit 30e79d2, so that when there is order by clause in the subquery, we can safely remove the order by clause. The previous logic was not wrong, but will keep the outer reference as correlated subquery, which is also worse plan but correct correlated plan. But in our preprocessing step, 26th, it is doing wrong on decorrelating subquery, in which case, we can't decorrelate actually. The commit 30e79d2 added a fix that allows adding a project in the translator that will echo the outer ref from within the subquery. `select foo.c from foo where b in (select bar.q from bar where bar.r =10 order by foo.c);` ORCA is currently unable to decorrelate sub-queries that contain project nodes.For the above query the parser adds foo.c to the target list of the subquery, that would get an echo projection and thus would prevent decorelation by ORCA as per the mentioned commit. This patch will not project a ORDER BY column so that we can safely remove the order by clause. Author: @DevChattopadhyay

This comimit enables additional opportunities for predicate push down based on a synergetic relationship between join conditions and select filters. For example, let's use query: SELECT count(*) FROM t2 INNER JOIN t1 ON t2.c BETWEEN t1.a AND t1.b WHERE t2.c = 2; Input: +--CLogicalSelect |--CLogicalNAryJoin | |--CLogicalGet "t2" ("t2"), Columns: ["c" (0), "d" (1),... | |--CLogicalGet "t1" ("t1"), Columns: ["a" (9), "b" (10),... | +--CScalarBoolOp (EboolopAnd) | |--CScalarCmp (>=) | | |--CScalarIdent "c" (0) | | +--CScalarIdent "a" (9) | +--CScalarCmp (<=) | |--CScalarIdent "c" (0) | +--CScalarIdent "b" (10) +--CScalarCmp (=) |--CScalarIdent "c" (0) +--CScalarConst (2) Notice that the join predicate references columns (a, b, and c) coming from either sides of the join. Typically this means the predicate cannot be pushed down. However, the select filter above the join predicate filters on c=2. We can utilize that in conjunction with the join condition to filter out more rows. We do that outputing: Output: +--CLogicalSelect +--CLogicalSelect | |--CLogicalNAryJoin | | |--CLogicalGet "t2" ("t2"), Columns: ["c" (0), "d" (1),... | | |--CLogicalGet "t1" ("t1"), Columns: ["a" (9), "b" (10),... | | +--CScalarBoolOp (EboolopAnd) | | |--CScalarCmp (>=) | | | |--CScalarIdent "c" (0) | | | +--CScalarIdent "a" (9) | | +--CScalarCmp (<=) | | |--CScalarIdent "c" (0) | | +--CScalarIdent "b" (10) | +--CScalarBoolOp (EboolopAnd) | |--CScalarCmp (>=) | | +--CScalarConst (2) | | +--CScalarIdent "a" (9) | +--CScalarCmp (<=) | | +--CScalarConst (2) | +--CScalarIdent "b" (10) +--CScalarCmp (=) |--CScalarIdent "c" (0) +--CScalarConst (2) Note: One might naively think that it is possible to directly replace the constant identifier in the join predicate. However, that would be a mistake as it could prevent hash join alternative plans. This is why we must instead add a new select filter above the join. The preprocessor step in this commit can add new select filters that were constructed by substituting constants into join conditions. It is possible that the new filter is of the form: (const relop ident) However, prior to this commit the constraint interval is constructed using a scalar compare operator that is expected to be an expression of the form: (ident relop const) There doesn't seem a good reason to not to support that form too. In fact there's a FIXME from 2012 that suggests maybe we should enable even more forms. So this commit also enables the above form.

…ed on gp_segment_id In ORCA direct dispatch was supported for Hash distributed tables, with this update we are able to perform direct dispatch for a randomly distributed table with condition on gp_segment_id. CODE CHANGES A patch was added in "CTranslatorExprToDXLUtils::SetDirectDispatchInfo(,,,,)" to check the condition if the table is a 'Random distrubuted table'. If yes, then a newly added function "CTranslatorExprToDXLUtils::GetDXLDirectDispatchInfoRandDist(,,,)" is called to check whether 'direct dispatch' is feasible or not. PR LINK https://github.com/greenplum-db/gpdb/pull/14725

…on (#14958) With the new partitioning architecture, we pass through the constraint expressions on the partitioning key to the dynamic table scan. Therefore, we don't need to generate the predicate on the partition key, but instead we can retrieve it from m_partition_cnstrs_disj->PexprScalar(mp). I didn't modify any of the dynamic elimination logic--this is only for the static elimination case. The goal of this concept is to properly estimate the cardinality of scans on partitioned tables. In 6X, we didn't have access to individual constraints on individual partitions, so we had to estimate and assume uniformity. We extracted the partition key and assumed that was the "constraint" that we were scanning on. We now have the actual constraint for the partitions, so we use those instead. When costing a table scan, we use the stats of the entire table. Scanning 1 million tuples costs more than 10 tuples, even if we're projecting 10 tuples in each case. For non-partitioned tables, we simply get the stats of the base table. For partitioned tables within orca, we want the stats of the partitioned table after we've done static and dynamic elimination. In the previous partitioning architecture, we didn't perform static elimination within Orca and didn't have access to the constraints of individual partitions. If we had a predicate where pkey=2, we had to estimate how many rows were in the partition containing pkey=2. Now we have those constraints, and can use these constraints for more accurate estimates. There were a couple of cardinality changes with this, but they're more accurate as far as I can tell. The change in EffectOfLocalPredOnJoin2.mdp is a better estimate, since the partitions being scanned don't have any statistics according to the mdp.

Issue: The join of two columns, one varchar and one char, of matching values with trailing spaces, generates empty output. Example: create table foo (a varchar(3)) distributed by (a); create table bar (b char(3)) distributed by (b); insert into foo values ('1 '); insert into bar values ('1 '); select a, b from foo join bar on a = b; a | p ---+--- (0 rows) Root cause: According to Postgres documentation, "Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character." This means, '1 ' of char(3) (1 trailing space) is stored as '1 ' (2 trailing spaces). However, when it's used for comparison, such as in a join condition or filter, all the trailing spaces are removed, so it's treated as '1'. In our example, the join condition foo.a = bar.b doesn't specify casting. Therefore, foo.a of varchar is cast to bpchar, and the join condition becomes bar.b = (foo.a)::bpchar. The trailing spaces are removed for both foo.a and bar.b. '1' == '1' matches, so the query should return 1 row. However, ORCA generates an additional hash condition, where both foo.a and bar.p are cast to text. The join condition becomes (bar.b = (foo.a)::bpchar) AND ((bar.p)::text = (foo.a)::text)). '1 ' of varchar stays '1 ' when cast to text. However, '1 ' of char(3) becomes '1' when cast to text, cause casting is considered comparison. '1 ' doesn't match '1', so 0 rows is returned. The derivation of additional hash conditions is designed to optimize multi-column join operations, and in itself has no fault. However, casting char to text (ORCA's derived join condition) isn't equivalent to casting varchar to char (user's join condition). This is because hash of char and hash of varchar/text are two operator classes that belong to two different operator families. If two opclasses belong to different opfamilies, the same operator (equality in this case) invokes different internal functions. Casting foo.a to char enforces a motion on foo, whereas casting bar.b to text enforces a motion on bar. Solution: We fix this bug by addressing the opfamily mismatch. When deriving the equivalence classes, instead of checking if the opfamilies of post-cast types match, we check the matching of opfamilies of pre-cast types.

Skip dropped columns when checking constraints

This FIXME was to shortcut the logic if there is a constraint that is unable to be simplified by the constraint derivation framework. For example, if the partition key is on a float and we're comparing with an int. While this can shortcut some of the logic, this isn't a hotspot and adds some complexity since we'd still need to populate the disjunction, or set it to null and handle the null value later in the optimization. Overall, I don't think it's worth the added complexity for very negligible performance benefit in corner cases.

Commit 558d77f renamed ArrayRef to SubscriptingRef. We won't rename this in Orca for now since there's a good amount to change.

Converting the bucket GPDB_12_MERGE_FIXME to a regular fixme since this wasn't related/introduced to the merge

Instead, we check the relation's partdec. In the future I'd like to optimize getting the relation for use in Orca's relcache. For example, in many cases we don't need each attribute and statistics for each attribute, and this gets expensive. Also removes some other unused functions

There are two parts in the current cherry-pick that need to be adapted: 1. In CBDB, the `rel->rd_partdesc` cannot be used as a basis for determining that current table is a partition table or not. 2. `get_relation_statistics` implementation has changed.

STATS_EXT_DEPENDENCIES can be empty after analyzed. This is because when the degree of all combinations is 0, PG will not record the dependencies data in pg_statistic_ext_data. The current changes ensure that ORCA will not fail to generate a plan due to failure to read dependencies data, and also ignore unsupported ext stats type(STATS_EXT_EXPRESSIONS).

jiaqizho changed the title ~~Cherry pick some fixes and clean in ORCA~~ [DNM]Cherry pick some fixes and clean in ORCA Jan 9, 2025

my-ship-it added the cherry-pick cherry-pick upstream commts label Jan 9, 2025

jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch from 99c9b08 to 9c09f36 Compare January 10, 2025 09:56

jiaqizho changed the title ~~[DNM]Cherry pick some fixes and clean in ORCA~~ Cherry pick "ORCA support ext stats, Fix EPQ..." Jan 10, 2025

jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch 6 times, most recently from 1fcfb03 to 11aad68 Compare January 15, 2025 06:39

jiaqizho requested review from avamingli, yjhjstz and my-ship-it January 15, 2025 07:19

jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch 3 times, most recently from 25bfcb2 to 1f35efa Compare January 16, 2025 02:41

avamingli previously approved these changes Jan 16, 2025

View reviewed changes

my-ship-it previously approved these changes Jan 16, 2025

View reviewed changes

my-ship-it removed the request for review from yjhjstz January 16, 2025 08:59

jiaqizho changed the base branch from main to 1X_STABLE January 16, 2025 11:11

jiaqizho changed the base branch from 1X_STABLE to main January 16, 2025 11:11

tuhaihe approved these changes Jan 16, 2025

View reviewed changes

gpopt and others added 4 commits January 16, 2025 19:31

Jingyu Wang and others added 27 commits January 16, 2025 19:31

Add GUC optimizer_enable_foreign_table (#14844)

06017c7

This adds a guc optimizer_enable_foreign_table. If enabled, Orca will generate plans involving foreign tables. Otherwise, we will fall back to planner. By default this guc is enabled.

Remove unused Orca partitioning code in Orca

1b192dd

This is unused code that was used with the old partitioning architecture in Orca. Since we now do static elimination within Orca, this is no longer needed. Also resolves a fixme

Resolve Orca FIXME for FValidPartEnforcers

26bc992

This is already handled within property enforcement. We don't need to recheck here.

Remove unused mdpart_constraint from indexes in Orca

019a6aa

This was used in the previous partitioning architecture with partial indexes, but is now unused. Also resolves a FIXME.

Address CTE translation FIXMEs

4b314cb

When creating a table descriptor in Orca for a CTAS query, the created table should have a RowExclusiveLock as planner does in createas.c.

Address Orca FIXME: remove test

4deb6ee

This mdp test apparently had a join order change, but the mdp no longer runs and there's no DDL/query to repro this, so there's not anything to work with here. Remove the fixme.

Fix unused variable compile warnings

fb0246a

Orca FIXME: skip dropped columns

29703ed

Skip dropped columns when checking constraints

Remove renaming orca fixme

6f4f1ac

Commit 558d77f renamed ArrayRef to SubscriptingRef. We won't rename this in Orca for now since there's a good amount to change.

Address a couple of Orca fixmes

5adc3b8

Converting the bucket GPDB_12_MERGE_FIXME to a regular fixme since this wasn't related/introduced to the merge

Fix icw test cases generted from "ORCA support ext stats, Fix EPQ..."

2586ea8

jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch from 1f35efa to 2586ea8 Compare January 16, 2025 11:32

avamingli approved these changes Jan 16, 2025

View reviewed changes

jiaqizho merged commit ebd5e18 into apache:main Jan 16, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry pick "ORCA support ext stats, Fix EPQ..." #855

Cherry pick "ORCA support ext stats, Fix EPQ..." #855

jiaqizho commented Jan 9, 2025

avamingli commented Jan 16, 2025

jiaqizho commented Jan 16, 2025 •

edited

Loading

Cherry pick "ORCA support ext stats, Fix EPQ..." #855

Cherry pick "ORCA support ext stats, Fix EPQ..." #855

Conversation

jiaqizho commented Jan 9, 2025

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

avamingli commented Jan 16, 2025

jiaqizho commented Jan 16, 2025 • edited Loading

jiaqizho commented Jan 16, 2025 •

edited

Loading