Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick "ORCA support ext stats, Fix EPQ..." #855

Merged
merged 35 commits into from
Jan 16, 2025

Conversation

jiaqizho
Copy link
Contributor

@jiaqizho jiaqizho commented Jan 9, 2025

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@jiaqizho jiaqizho changed the title Cherry pick some fixes and clean in ORCA [DNM]Cherry pick some fixes and clean in ORCA Jan 9, 2025
@my-ship-it my-ship-it added the cherry-pick cherry-pick upstream commts label Jan 9, 2025
@jiaqizho jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch from 99c9b08 to 9c09f36 Compare January 10, 2025 09:56
@jiaqizho jiaqizho changed the title [DNM]Cherry pick some fixes and clean in ORCA Cherry pick "ORCA support ext stats, Fix EPQ..." Jan 10, 2025
@jiaqizho jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch 6 times, most recently from 1fcfb03 to 11aad68 Compare January 15, 2025 06:39
@jiaqizho jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch 3 times, most recently from 25bfcb2 to 1f35efa Compare January 16, 2025 02:41
avamingli
avamingli previously approved these changes Jan 16, 2025
@avamingli
Copy link
Contributor

Squash fix commits to one?

@jiaqizho
Copy link
Contributor Author

jiaqizho commented Jan 16, 2025

Squash fix commits to one?

My idea is to split relatively independent commits as much as possible.

  • Commit 1: Used to fix the coredump caused by partdesc
  • Commit 2: Handle the parts that ext stats does not support
  • Commit 3: Fix the changes of icw.

I think splitting the commits is helpful for problem location(backtracking) and better readability for single commit ? :)

my-ship-it
my-ship-it previously approved these changes Jan 16, 2025
@my-ship-it my-ship-it removed the request for review from yjhjstz January 16, 2025 08:59
@jiaqizho jiaqizho changed the base branch from main to 1X_STABLE January 16, 2025 11:11
@jiaqizho jiaqizho changed the base branch from 1X_STABLE to main January 16, 2025 11:11
@jiaqizho jiaqizho dismissed stale reviews from my-ship-it and avamingli January 16, 2025 11:11

The base branch was changed.

gpopt and others added 4 commits January 16, 2025 19:31
Add a GUC `optimizer_discard_redistribute_hashjoin` to control redistribute
hashjoin.

If any one of HashJoin's children operators is RedistributeMotion, this plan
alternative will be given MAX cost, hence it will not be picked by optimizer.
Default value is `OFF`.
Note that, commit d0f8614 enable ORCA to produce plan for foreign table
query, but for locking clause on the foreign table, ORCA cann't produce
correct remote SQL, which will lose locking clause.

For example, for the query "SELECT * FROM ft1 FOR UPDATE", the ORCA will
produce the following plan:

    postgres=# explain (verbose) SELECT * FROM ft1 where c1 in (select c1
    from ft2 for share) for update;

                                        QUERY PLAN
    ------------------------------------------------------------------------------------------
    Hash Semi Join (cost=0.00..862.33 rows=1 width=47)
       Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8
       Hash Cond: (ft1.c1 = ft2.c1)
              ->  Foreign Scan on public.ft1  (cost=0.00..431.04 rows=1000 width=47)
                    Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8
                    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
       ->  Hash  (cost=431.00..431.00 rows=1 width=4)
             Output: ft2.c1
             ->  Foreign Scan on public.ft2  (cost=0.00..431.00 rows=1 width=4)
                   Output: ft2.c1
                   Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
     Optimizer: Pivotal Optimizer (GPORCA)
     (12 rows)

while postgres optimizer will output the plan as follows:
                                                QUERY PLAN
    -----------------------------------------------------------------------------------------------------------
    Hash Join  (cost=679.55..726.30 rows=1000 width=146)
      Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, "ANY_subquery".*
      Inner Unique: true
      Hash Cond: (ft1.c1 = "ANY_subquery".c1)
       ->  Foreign Scan on public.ft1  (cost=100.00..133.00 rows=1000 width=118)
             Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
             Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
      ->  Hash  (cost=567.05..567.05 rows=1000 width=32)
            Output: "ANY_subquery".*, "ANY_subquery".c1
            ->  HashAggregate  (cost=563.72..567.05 rows=1000 width=32)
                  Output: "ANY_subquery".*, "ANY_subquery".c1
                  Group Key: "ANY_subquery".c1
                  ->  Subquery Scan on "ANY_subquery" (cost=100.00..561.22 rows=1000 width=32)
                        Output: "ANY_subquery".*, "ANY_subquery".c1
                        ->  Foreign Scan on public.ft2 (cost=100.00..551.22 rows=1000 width=47)
                              Output: ft2.c1, ft2.*
                              Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
     Optimizer: Postgres query optimizer
     (18 rows)

We can see the remote SQL of the two plan, the ORCA's is wrong
obviously.
Since it lost the locking clause.

After going through the related code of ORCA, I found that support this
locking clause on foreign table is not that easy. We have to populate
the RowMarkCluase to PlannedStmt. Even for local table, ORCA didn't
handle
this, it just add an Exclusive lock on the corresponding table.

So in this commit, we just fall back to postgres query optimizer, when
there is a locking clause on foreign table.
For gpdb7 will only support python3 and we also update the
gpMgmt/requirements-dev.txt to remove the stick the modules to the python2
compatible versions introduced by the commit 68a8237.

[GPR-1153]

Co-authored-by: Anusha Shakarad <[email protected]>
Co-authored-by: Ning Wu <[email protected]>
because:

1) ORCA (gpopt, gporca) - All bitcodes are used by LLVM JIT inlining
optimization step, which occurs in executor stage. ORCA is only accessed in
plan stage, so there is no need to build bitcodes;

2) gpcloud - C++ is more complex and related LLVM API seems not so stable.
The upstream code also DO NOT have C++ module, making C++ bitcode new to us.
Considering the rare usage of gpcloud and the risk, it would be better
to disable before we have a full verfication and more C++ modules used in
executor stage.

3) And a side effect - no workaround needed for emiting bitcode of C++ modules,
even for CLANG/LLVM version < 9. The config/llvm.m4 keeps synced to supported
LLVM versions. We also avoid updating the build system to solving CXX standard
collision between ORCA requirement and the one CLANG provides by default.

To disable bitcodes for the above parts, `with_llvm = no` is set before
including src/backend/common.mk (for gporca/gpopt) and pgxs.mk (for gpcloud)
In addition, gporca/gpopt are filered out in install-postgres-bitcode.
Jingyu Wang and others added 27 commits January 16, 2025 19:31
Fixme:
src/backend/gpopt/translate/CPartPruneStepsBuilder.cpp:
// GPDB_12_MERGE_FIXME: This should be StrategyNumber, but IndexOpProperties
// takes an INT

Action:
Both `StrategyNumber opstrategy` and IndexOpProperties argument `INT strategy`
are upstream definitions. It's better to hide StrategyNumber (16 bit) to INT (32
bit) conversion in gpdb wrapper.
…778)

This commit is an extension of commit 8467aa6b which added ORCA support
to use functional dependency extended stats for estimating correlated
cardinality. Here we add support for STATS_EXT_NDISTINCT.

After this commit ORCA is now able to produce better cardinality estimates for
GROUP BY or DISTINCT on correlated columns. Following is a basic example:

    ```
    CREATE TABLE t (a INT, b INT);
    INSERT INTO t SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i);
    CREATE STATISTICS stts (ndistinct) ON a, b FROM t;
    ANALYZE t;
    EXPLAIN SELECT COUNT(*) FROM t GROUP BY a, b;
    ```
This adds a guc optimizer_enable_foreign_table. If enabled, Orca will
generate plans involving foreign tables. Otherwise, we will fall back to
planner. By default this guc is enabled.
Previously, Orca would crash if encountering this. For now, fall back
similarly to what we do for RTE_TABLEFUNCTION.

Example query: explain SELECT * FROM xmltable('/root' passing '' COLUMNS element text);

This actually fell back correctly in retail build, but we were hitting a
`__builtin_unreachable` call, which is undefined behavior, so we were
likely just getting lucky.

Fixes issue reported in https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/4F8GUSs04L8
This is unused code that was used with the old partitioning architecture
in Orca. Since we now do static elimination within Orca, this is no
longer needed. Also resolves a fixme
This is already handled within property enforcement. We don't need to
recheck here.
This was used in the previous partitioning architecture with partial
indexes, but is now unused. Also resolves a FIXME.
When creating a table descriptor in Orca for a CTAS query, the created
table should have a RowExclusiveLock as planner does in createas.c.
Mark this as GPDB_12_MERGE_FEATURE_NOT_SUPPORTED, as this is more an
improvement than a bug (maintains existing behavior as 6X). It needs a
fair amount of thinking as this can suffer from the same issue of
correlated stats in join predicates.
This mdp test apparently had a join order change, but the mdp no longer
runs and there's no DDL/query to repro this, so there's not anything to
work with here. Remove the fixme.
Previously, we had commented out the test for "TPCH-Q5.mdp" since it was
hitting an assertion during constraint derivation. The main culprit here
is that `IsLessThan` returned "false" when we didn't use the internal
evaluator nor the external evaluator, which is the case for mdps.

Usually we don't create mdps of this nature. However, we hit this case
because we're comparing a timestamp and a date type. By themselves
they're both comparable and can even use the internal evaluator.
However, date types use LINT, whereas timestamp use DOUBLE. This led to
invalid comparisons, but is only an issue for mdps.

This is now more obvious with an assertion if can_use_external_evaluator
is false. Additionally, remove a couple mdps that now hit this assertion
and replace it with an ICW test.
…ome cases (#14835)

If the inner child is Result or Filter node, Orca may generate a plan that does
not materialize the nestloop join's inner child, which will case bad
performance. This patch has fixed the issue by forcing materialize the inner
child.
…105)

On GPDB6, the GUC gp_enable_sort_distinct is used to optimize sort by indicating
"noduplicate" to remove duplicated rows and reduce data handling. It effects on
replacement selection sort only. However, on GPDB7, replacement selection sort
has been removed completely. So we need to remove gp_enable_sort_distinct and
other codes related to "noduplicate" sort.
…rence (#14905)

This patch reinstates the logic before commit 30e79d2, so that when there is
order by clause in the subquery, we can safely remove the order by clause. The
previous logic was not wrong, but will keep the outer reference as correlated
subquery, which is also worse plan but correct correlated plan. But in our
preprocessing step, 26th, it is doing wrong on decorrelating subquery, in which
case, we can't decorrelate actually.

The commit 30e79d2 added a fix that allows adding a project in the translator
that will echo the outer ref from within the subquery.

`select foo.c from foo where b in 
  (select bar.q from bar where bar.r =10 order by foo.c);`

ORCA is currently unable to decorrelate sub-queries that contain project
nodes.For the above query the parser adds foo.c to the target list of the
subquery, that would get an echo projection and thus would prevent decorelation
by ORCA as per the mentioned commit. This patch will not project a ORDER BY
column so that we can safely remove the order by clause.

Author: @DevChattopadhyay
This comimit enables additional opportunities for predicate push down
based on a synergetic relationship between join conditions and select
filters.  For example, let's use query:

    SELECT count(*) FROM t2 INNER JOIN t1 ON t2.c BETWEEN t1.a AND t1.b WHERE t2.c = 2;

Input:
    +--CLogicalSelect
       |--CLogicalNAryJoin
       |  |--CLogicalGet "t2" ("t2"), Columns: ["c" (0), "d" (1),...
       |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (9), "b" (10),...
       |  +--CScalarBoolOp (EboolopAnd)
       |     |--CScalarCmp (>=)
       |     |  |--CScalarIdent "c" (0)
       |     |  +--CScalarIdent "a" (9)
       |     +--CScalarCmp (<=)
       |        |--CScalarIdent "c" (0)
       |        +--CScalarIdent "b" (10)
       +--CScalarCmp (=)
          |--CScalarIdent "c" (0)
          +--CScalarConst (2)

Notice that the join predicate references columns (a, b, and c) coming
from either sides of the join. Typically this means the predicate cannot
be pushed down.  However, the select filter above the join predicate
filters on c=2.  We can utilize that in conjunction with the join
condition to filter out more rows. We do that outputing:

Output:
    +--CLogicalSelect
       +--CLogicalSelect
       |  |--CLogicalNAryJoin
       |  |  |--CLogicalGet "t2" ("t2"), Columns: ["c" (0), "d" (1),...
       |  |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (9), "b" (10),...
       |  |  +--CScalarBoolOp (EboolopAnd)
       |  |     |--CScalarCmp (>=)
       |  |     |  |--CScalarIdent "c" (0)
       |  |     |  +--CScalarIdent "a" (9)
       |  |     +--CScalarCmp (<=)
       |  |        |--CScalarIdent "c" (0)
       |  |        +--CScalarIdent "b" (10)
       |  +--CScalarBoolOp (EboolopAnd)
       |     |--CScalarCmp (>=)
       |     |  +--CScalarConst (2)
       |     |  +--CScalarIdent "a" (9)
       |     +--CScalarCmp (<=)
       |     |  +--CScalarConst (2)
       |        +--CScalarIdent "b" (10)
       +--CScalarCmp (=)
          |--CScalarIdent "c" (0)
          +--CScalarConst (2)

Note: One might naively think that it is possible to directly replace
the constant identifier in the join predicate.  However, that would be a
mistake as it could prevent hash join alternative plans. This is why we
must instead add a new select filter above the join.

The preprocessor step in this commit can add new select filters that
were constructed by substituting constants into join conditions.  It is
possible that the new filter is of the form:

    (const relop ident)

However, prior to this commit the constraint interval is constructed
using a scalar compare operator that is expected to be an expression of
the form:

    (ident relop const)

There doesn't seem a good reason to not to support that form too. In
fact there's a FIXME from 2012 that suggests maybe we should enable even
more forms.  So this commit also enables the above form.
…ed on gp_segment_id

In ORCA direct dispatch was supported for Hash distributed tables, with this
update we are able to perform direct dispatch for a randomly distributed table
with condition on gp_segment_id.

CODE CHANGES
A patch was added in "CTranslatorExprToDXLUtils::SetDirectDispatchInfo(,,,,)"
to check the condition if the table is a 'Random distrubuted table'. If yes, then
a newly added function "CTranslatorExprToDXLUtils::GetDXLDirectDispatchInfoRandDist(,,,)"
is called to check whether 'direct dispatch' is feasible or not.

PR LINK
https://github.com/greenplum-db/gpdb/pull/14725
…on (#14958)

With the new partitioning architecture, we pass through the constraint
expressions on the partitioning key to the dynamic table scan. Therefore, we
don't need to generate the predicate on the partition key, but instead we can
retrieve it from m_partition_cnstrs_disj->PexprScalar(mp). I didn't modify any
of the dynamic elimination logic--this is only for the static elimination case.

The goal of this concept is to properly estimate the cardinality of scans on
partitioned tables. In 6X, we didn't have access to individual constraints on
individual partitions, so we had to estimate and assume uniformity. We
extracted the partition key and assumed that was the "constraint" that we were
scanning on. We now have the actual constraint for the partitions, so we use
those instead.

When costing a table scan, we use the stats of the entire table. Scanning 1
million tuples costs more than 10 tuples, even if we're projecting 10 tuples in
each case. For non-partitioned tables, we simply get the stats of the base
table. For partitioned tables within orca, we want the stats of the partitioned
table after we've done static and dynamic elimination. In the previous
partitioning architecture, we didn't perform static elimination within Orca and
didn't have access to the constraints of individual partitions. If we had a
predicate where pkey=2, we had to estimate how many rows were in the partition
containing pkey=2. Now we have those constraints, and can use these constraints
for more accurate estimates.

There were a couple of cardinality changes with this, but they're more accurate
as far as I can tell. The change in EffectOfLocalPredOnJoin2.mdp is a better
estimate, since the partitions being scanned don't have any statistics
according to the mdp.
Issue:
The join of two columns, one varchar and one char, of matching values with
trailing spaces, generates empty output.

Example:
create table foo (a varchar(3)) distributed by (a);
create table bar (b char(3)) distributed by (b);
insert into foo values ('1 ');
insert into bar values ('1 ');
select a, b from foo join bar on a = b;

 a | p
---+---
(0 rows)

Root cause:
According to Postgres documentation, "Values of type character are physically
padded with spaces to the specified width n, and are stored and displayed that
way. However, trailing spaces are treated as semantically insignificant and
disregarded when comparing two values of type character."

This means, '1 ' of char(3) (1 trailing space) is stored as '1  ' (2 trailing
spaces). However, when it's used for comparison, such as in a join condition or
filter, all the trailing spaces are removed, so it's treated as '1'.

In our example, the join condition foo.a = bar.b doesn't specify casting.
Therefore, foo.a of varchar is cast to bpchar, and the join condition becomes
bar.b = (foo.a)::bpchar. The trailing spaces are removed for both foo.a and
bar.b. '1' == '1' matches, so the query should return 1 row.

However, ORCA generates an additional hash condition, where both foo.a and bar.p
are cast to text. The join condition becomes (bar.b = (foo.a)::bpchar) AND
((bar.p)::text = (foo.a)::text)). '1 ' of varchar stays '1 ' when cast to text.
However, '1  ' of char(3) becomes '1' when cast to text, cause casting is
considered comparison. '1 ' doesn't match '1', so 0 rows is returned.

The derivation of additional hash conditions is designed to optimize
multi-column join operations, and in itself has no fault. However, casting char
to text (ORCA's derived join condition) isn't equivalent to casting varchar to
char (user's join condition). This is because hash of char and hash of
varchar/text are two operator classes that belong to two different operator
families. If two opclasses belong to different opfamilies, the same operator
(equality in this case) invokes different internal functions. Casting foo.a to
char enforces a motion on foo, whereas casting bar.b to text enforces a motion
on bar.

Solution:
We fix this bug by addressing the opfamily mismatch. When deriving the
equivalence classes, instead of checking if the opfamilies of post-cast
types match, we check the matching of opfamilies of pre-cast types.
Skip dropped columns when checking constraints
This FIXME was to shortcut the logic if there is a constraint that is
unable to be simplified by the constraint derivation framework. For
example, if the partition key is on a float and we're comparing with an
int. While this can shortcut some of the logic, this isn't a hotspot
and adds some complexity since we'd still need to populate the
disjunction, or set it to null and handle the null value later in the
optimization. Overall, I don't think it's worth the added complexity for
very negligible performance benefit in corner cases.
Commit 558d77f renamed ArrayRef to SubscriptingRef. We won't
rename this in Orca for now since there's a good amount to change.
Converting the bucket GPDB_12_MERGE_FIXME to a regular fixme since this
wasn't related/introduced to the merge
Instead, we check the relation's partdec.

In the future I'd like to optimize getting the relation for use in
Orca's relcache. For example, in many cases we don't need each attribute
and statistics for each attribute, and this gets expensive.

Also removes some other unused functions
There are two parts in the current cherry-pick that need to be adapted:

 1. In CBDB, the `rel->rd_partdesc` cannot be used as a basis for
determining that current table is a partition table or not.
 2. `get_relation_statistics` implementation has changed.
STATS_EXT_DEPENDENCIES can be empty after analyzed. This is
because when the degree of all combinations is 0, PG will
not record the dependencies data in pg_statistic_ext_data.

The current changes ensure that ORCA will not fail to
generate a plan due to failure to read dependencies data,
and also ignore unsupported ext stats type(STATS_EXT_EXPRESSIONS).
@jiaqizho jiaqizho force-pushed the cherry-pick-orca-in-path-order-8 branch from 1f35efa to 2586ea8 Compare January 16, 2025 11:32
@jiaqizho jiaqizho merged commit ebd5e18 into apache:main Jan 16, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.