Tracking: parity with Postgres for TPC-H cardinality estimations #127

wangpatrick57 · 2024-03-22T13:11:13Z

Notes

Sometimes Postgres does really bad (even worse than our magic numbers!). However, the goal right now is simply to match Postgres, not to match the truecard. When I say fix, I mean match Postgres.
- This is because we know exactly what we need to do to match Postgres but we don't know what we need to do to match the truecard.
Experiments ran with scale factor 1.0, seed 15721
If no other PR is mentioned, then the query was run based on feat: caching optd stats, 12x speedup on TPC-H SF1 #132

Queries

The text was updated successfully, but these errors were encountered:

**Summary**: Using magic numbers from Postgres in various selectivity edge cases. **Demo**: Different (unfortunately worse) q-error on TPC-H SF1. See #127 for per-query details on how this PR affects q-error. ![Screenshot 2024-03-30 at 11 27 24](https://github.com/cmu-db/optd/assets/20631215/b0cce5d4-6ac8-4cd5-b0cf-48f86db14d26) **Details**: * Fixed the cardinality of Q10! * `INVALID_SEL` is **no longer used** at all during cardtest. It is still used during plannertest as some plannertests use the optd optimizer instead of the datafusion logical optimizer. This can be checked by replacing all instances of `INVALID_SEL` with a `panic!()` and seeing that cardtest still runs. * Using magic number from Postgres for `LIKE`. * Using magic number from Postgres for equality with various complex expressions. * Using magic number from Postgres for range comparison with various complex expressions. * Replaced `INVALID_SEL` with `panic!()` and `unreachable!()` statements in places where it makes sense.

**Summary**: Implemented join selectivity formulas for inner joins, left/right outer joins, and cross joins. Also properly accounts for filters in the join condition. **Demo**: We now match Postgres on our median Q-error. See #127 for more details on what queries this PR affected. ![Screenshot 2024-03-31 at 13 13 48](https://github.com/cmu-db/optd/assets/20631215/fae590a6-8c55-4016-b924-c697a1c25070) **Details**: * We only consider equality checks on columns of different tables to be "join on conditions". * Join selectivity formulas are from [Rogov 2022](https://postgrespro.com/blog/pgsql/5969618). * If there are multiple on conditions, we multiply their selectivities together.

wangpatrick57 changed the title ~~Tracking: parity with Postgres for cardinality estimation~~ Tracking: parity with Postgres for TPC-H cardinality estimations Mar 24, 2024

wangpatrick57 mentioned this issue Mar 30, 2024

feat: using proper magic numbers in various edge cases #143

Merged

This was referenced Mar 30, 2024

feat: add cost estimation for agg #144

Merged

feat: join selectivity #145

Merged

jurplel mentioned this issue Sep 3, 2024

Tracking: Which TPC-H queries can be run? #184

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: parity with Postgres for TPC-H cardinality estimations #127

Tracking: parity with Postgres for TPC-H cardinality estimations #127

wangpatrick57 commented Mar 22, 2024 •

edited by Gun9niR

Loading

Tracking: parity with Postgres for TPC-H cardinality estimations #127

Tracking: parity with Postgres for TPC-H cardinality estimations #127

Comments

wangpatrick57 commented Mar 22, 2024 • edited by Gun9niR Loading

Notes

Queries

wangpatrick57 commented Mar 22, 2024 •

edited by Gun9niR

Loading