Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance degradation going from Trino 419 to 464 #24136

Open
vburenin opened this issue Nov 14, 2024 · 4 comments
Open

Severe performance degradation going from Trino 419 to 464 #24136

vburenin opened this issue Nov 14, 2024 · 4 comments

Comments

@vburenin
Copy link

Today we have attempted migration from Trino 419 to Trino 464 and we started seeing a few queries failing with memory limit issues:

The query in question is this:

SELECT "foo" FROM
   (WITH om_stores AS
      (SELECT s.some_id,
              COALESCE(BOOL_OR(ent.feature_name = 'test' AND ent.disabled_at IS NULL), False) AS om
       FROM "iceberg"."dbname"."tblname" s
       LEFT JOIN iceberg.dbname2.data_history ent ON s.some_id = ent.some_id
       WHERE s.deleted_at IS NULL GROUP BY s.some_id)
           SELECT CAST(o.day_partition AS DATE) AS analysis_date,
                                   o.server_region,
                                   o.foo_slot AS foo,
                                   o.foo_slot
    FROM om_stores om
    LEFT JOIN iceberg.dbname3.foo_orders o ON om.some_id = o.some_id
    WHERE DATE(o.day_partition) >= current_date - INTERVAL '60' DAY
    GROUP BY o.day_partition, o.server_region, o.foo_slot) AS "virtual_table"
 WHERE "analysis_date" >= DATE '2024-10-14' AND "analysis_date" < DATE '2024-11-14'
 GROUP BY "foo", date_trunc('week', CAST("analysis_date" AS TIMESTAMP))
 LIMIT 1000;

It fails like this:

Query 20241114_213039_05361_smnbd, FAILED, 7 nodes
Splits: 20,928 total, 12,657 done (60.48%)
15.95 [531M rows, 10.7GB] [33.3M rows/s, 686MB/s]

Query 20241114_213039_05361_smnbd failed: Query exceeded distributed user memory limit of 40GB

Changing WHERE DATE(o.day_partition) >= current_date - INTERVAL '60' DAY to WHERE o.day_partition >= '2024-09-15'
Immediately returns everything back to normal.

Query 20241114_213936_05525_smnbd, FINISHED, 7 nodes
Splits: 2,189 total, 2,189 done (100.00%)
5.34 [127M rows, 1.52GB] [23.7M rows/s, 292MB/s]

While this query itself is obviously not great, but this is what users came up with and we can't just break their stuff.

Explain shows a lack of missing filter for foo_orders table in Trino 464 while this example is copied from 419:
(CAST("day_partition_6" AS date) >= DATE '2024-09-15') AND (CAST("day_partition_6" AS DATE) >= DATE '2024-10-14') AND (CAST("day_partition_6" AS DATE) < DATE '2024-11-14')

All tables are Iceberg tables.

@martint
Copy link
Member

martint commented Nov 14, 2024

See #22987

@lordicecream
Copy link

@vburenin did adding the session property help with the queries? I am also facing severe degradation when upgrading my trino setup from 418 to 464

@vburenin
Copy link
Author

@lordicecream yes it did. Did you hit the same problem?

@vburenin
Copy link
Author

@lordicecream This config was introduced later, it is not applicable for 422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants