Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Optimizer - not choosing the better join order #137701

Open
sheaffej opened this issue Dec 18, 2024 · 1 comment
Open

SQL Optimizer - not choosing the better join order #137701

sheaffej opened this issue Dec 18, 2024 · 1 comment
Labels
A-sql-optimizer SQL logical planning and optimizations. branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@sheaffej
Copy link
Collaborator

sheaffej commented Dec 18, 2024

Describe the problem

The optimizer seems to be missing a join reordering opportunity that would avoid a full scan.

This issue was originally reported in Zendesk https://cockroachdb.zendesk.com/agent/tickets/24877
The Zendesk ticket includes statement bundles from the customer's cluster, and the un-redacted SQL.

This query below has a full scan on join_requests returning 1,087,393 rows, then a lookup join on projects which is an outer join. This lookup join is where all the time is being spent. Finally, the 820,399 rows from the lookup join are hash joined to teams. However, the scan on teams results in 1 row, and the hash join results in 0 rows.

sql>       SELECT
  ->         sj0.id AS id
  ->       FROM
  ->         join_requests AS sj0
  ->         LEFT JOIN projects AS sp1 ON sp1.id = sj0.project_id
  ->         INNER JOIN teams AS st2 ON st2.id = sj0.team_id
  ->       WHERE
  ->         (st2.deleted_at IS NULL)
  ->         AND ((sp1.id IS NULL) OR (sp1.deleted_at IS NULL))
  ->         AND st2.account_id = '<redacted>':::UUID
  -> ;
  id
------
(0 rows)

Time: 5.721s total (execution 5.641s / network 0.080s)

If I simply switch the order that projects and teams are listed in the SQL, the subquery returns in 6ms. This scans teams and results in 1 row, which is lookup joined to join_requests resulting in 0 rows.

sql>       SELECT
  ->         sj0.id AS id
  ->       FROM
  ->         join_requests AS sj0
  ->         INNER JOIN teams AS st2 ON st2.id = sj0.team_id
  ->         LEFT JOIN projects AS sp1 ON sp1.id = sj0.project_id
  ->       WHERE
  ->         (st2.deleted_at IS NULL)
  ->         AND ((sp1.id IS NULL) OR (sp1.deleted_at IS NULL))
  ->         AND st2.account_id = '<redacted>':::UUID;
  id
------
(0 rows)

Time: 90ms total (execution 6ms / network 83ms)

To Reproduce

Statement bundles of both versions of the original query are included in the Zendesk ticket https://cockroachdb.zendesk.com/agent/tickets/24877

The query described above is just one subquery within a larger SQL. The statement bundles in the Zendesk ticket include the full original SQL.

I checked the reorder_joins_limit which was set to the default. When I set reorder_joins_limit = 10, there was no improvement in the query or plan using the original join order.

Expected behavior
I expected the optimizer to consider the join reorder without a SQL modiication.

Environment:

  • CockroachDB version: v24.2.5
  • Cockroach Cloud AWS Advanced cluster
    • Single-region us-east-1 with 12 nodes

Jira issue: CRDB-45710

gz#24877

@sheaffej sheaffej added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. T-sql-queries SQL Queries Team labels Dec 18, 2024
Copy link

blathers-crl bot commented Dec 18, 2024

Hi @sheaffej, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Dec 18, 2024
@sheaffej sheaffej added the branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Status: Triage
Development

No branches or pull requests

1 participant