You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.
Sometimes Postgres does really bad (even worse than our magic numbers!). However, the goal right now is simply to match Postgres, not to match the truecard. When I say fix, I mean match Postgres.
This is because we know exactly what we need to do to match Postgres but we don't know what we need to do to match the truecard.
Fixing single-dim group by and pulling expressions up to the group by should fix this. Postgres identifies that the group by is done on EXTRACT(year FROM orders.o_orderdate) and it simply uses the N-Distinct of orders.o_orderdate as the cardinality of the query.
Q9: truecard=175, pgcard=60150, dfcard=5000
Fixing single-dim group by and pulling expressions up to the group by should fix this. When you get rid of p_name like '%forest' and just use o_orderdate as o_year, you get exactly 60150 rows.
The text was updated successfully, but these errors were encountered:
wangpatrick57
changed the title
Tracking: parity with Postgres for cardinality estimation
Tracking: parity with Postgres for TPC-H cardinality estimations
Mar 24, 2024
**Summary**: Using magic numbers from Postgres in various selectivity
edge cases.
**Demo**:
Different (unfortunately worse) q-error on TPC-H SF1. See #127 for
per-query details on how this PR affects q-error.
![Screenshot 2024-03-30 at 11 27
24](https://github.com/cmu-db/optd/assets/20631215/b0cce5d4-6ac8-4cd5-b0cf-48f86db14d26)
**Details**:
* Fixed the cardinality of Q10!
* `INVALID_SEL` is **no longer used** at all during cardtest. It is
still used during plannertest as some plannertests use the optd
optimizer instead of the datafusion logical optimizer. This can be
checked by replacing all instances of `INVALID_SEL` with a `panic!()`
and seeing that cardtest still runs.
* Using magic number from Postgres for `LIKE`.
* Using magic number from Postgres for equality with various complex
expressions.
* Using magic number from Postgres for range comparison with various
complex expressions.
* Replaced `INVALID_SEL` with `panic!()` and `unreachable!()` statements
in places where it makes sense.
**Summary**: Implemented join selectivity formulas for inner joins,
left/right outer joins, and cross joins. Also properly accounts for
filters in the join condition.
**Demo**:
We now match Postgres on our median Q-error. See #127 for more details
on what queries this PR affected.
![Screenshot 2024-03-31 at 13 13
48](https://github.com/cmu-db/optd/assets/20631215/fae590a6-8c55-4016-b924-c697a1c25070)
**Details**:
* We only consider equality checks on columns of different tables to be
"join on conditions".
* Join selectivity formulas are from [Rogov
2022](https://postgrespro.com/blog/pgsql/5969618).
* If there are multiple on conditions, we multiply their selectivities
together.
Notes
Queries
EXTRACT(year FROM orders.o_orderdate)
and it simply uses the N-Distinct of orders.o_orderdate as the cardinality of the query.p_name like '%forest'
and just useo_orderdate as o_year
, you get exactly 60150 rows.The text was updated successfully, but these errors were encountered: