Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg queries stuck in the planning phase for long time #23451

Closed
sriharshaj opened this issue Sep 16, 2024 · 21 comments · Fixed by #23586
Closed

Iceberg queries stuck in the planning phase for long time #23451

sriharshaj opened this issue Sep 16, 2024 · 21 comments · Fixed by #23586

Comments

@sriharshaj
Copy link

sriharshaj commented Sep 16, 2024

Our Iceberg queries are getting stuck in the planning phase for 2 to 3 minutes, although they eventually run successfully. We are currently upgrading Trino from version 444 to 454. This issue has been occurring since Trino version 451 (we went back and tried different versions).

Our Iceberg stack -->
Storage: s3
File format: parquet
Catalog: Hive

Query explain:

 Trino version: 454
 Fragment 0 [SINGLE]
     Output layout: [expr]
     Output partitioning: SINGLE []
     Output[columnNames = [_col0]]
     │   Layout: [expr:integer]
     │   Estimates: {rows: 1 (5B), cpu: 0, memory: 0B, network: 0B}
     │   _col0 := expr
     └─ Project[]
        │   Layout: [expr:integer]
        │   Estimates: {rows: 1 (5B), cpu: 5, memory: 0B, network: 0B}
        │   expr := integer '1'
        └─ Limit[count = 1]
           │   Layout: []
           │   Estimates: {rows: 1 (0B), cpu: 0, memory: 0B, network: 0B}
           └─ LocalExchange[partitioning = SINGLE]
              │   Layout: []
              │   Estimates: {rows: 1 (0B), cpu: 0, memory: 0B, network: 0B}
              └─ RemoteSource[sourceFragmentIds = [1]]
                     Layout: []

 Fragment 1 [SOURCE]
     Output layout: []
     Output partitioning: SINGLE []
     LimitPartial[count = 1]
     │   Layout: []
     │   Estimates: {rows: 1 (0B), cpu: 0, memory: 0B, network: 0B}
     └─ TableScan[table = table_name$data@2183018889107447028 constraint on [dt] LIMIT 1]
            Layout: []
            Estimates: {rows: 8499944 (0B), cpu: 0, memory: 0B, network: 0B}
            83:dt:varchar
                :: [[2024-08-30T13:00:00Z, 2024-08-30T14:00:00Z)]

(1 row)

Query 20240830_150248_35329_wgfii, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
3:23 [0 rows, 0B] [0 rows/s, 0B/s]

Query info:
https://gist.github.com/sriharshaj/f26e655f233b84754da8216be2ae0172

@wendigo
Copy link
Contributor

wendigo commented Sep 17, 2024

This could be related to #23384

@wendigo
Copy link
Contributor

wendigo commented Sep 17, 2024

We will release a new Trino version this week. I'll postpone further investigation so we can check whether the fix in this new version helps.

@sriharshaj
Copy link
Author

@wendigo We tested Trino version 458, and while the planning time has improved, it remains unusually high, ranging between 30 to 50 seconds.

@sriharshaj sriharshaj changed the title Iceberg queries are getting stuck in the planning phase for 2 to 3 minutes Iceberg queries stuck in the planning phase for 2-3 minutes Sep 18, 2024
@wendigo
Copy link
Contributor

wendigo commented Sep 18, 2024

@sriharshaj do you know what accounts for this number? Metadata retrieval from storage? Metastore calls?

@sriharshaj
Copy link
Author

Here are the optimizer summaries.

   "optimizerRulesSummaries": [
      {
        "rule": "io.trino.sql.planner.optimizations.AddExchanges",
        "invocations": 1,
        "applied": 1,
        "totalTime": 5120641837,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.DetermineTableScanNodePartitioning",
        "invocations": 3,
        "applied": 1,
        "totalTime": 4029729665,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PushPredicateIntoTableScan",
        "invocations": 1,
        "applied": 1,
        "totalTime": 3586457350,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.FilterExpressionRewrite",
        "invocations": 58,
        "applied": 1,
        "totalTime": 2148574,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.ProjectExpressionRewrite",
        "invocations": 92,
        "applied": 0,
        "totalTime": 921214,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.optimizations.PredicatePushDown",
        "invocations": 7,
        "applied": 7,
        "totalTime": 442238,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PushLimitIntoTableScan",
        "invocations": 4,
        "applied": 1,
        "totalTime": 293083,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PruneTableScanColumns",
        "invocations": 7,
        "applied": 1,
        "totalTime": 273248,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PruneOutputSourceColumns",
        "invocations": 16,
        "applied": 1,
        "totalTime": 268922,
        "failures": 0
      },
      {
        "rule": "io.trino.sql.planner.iterative.rule.PruneProjectColumns",
        "invocations": 7,
        "applied": 4,
        "totalTime": 224337,
        "failures": 0
      }
    ],

Is there a way to analyze why the optimizers are taking so long? Additionally, where can I find details on metadata retrieval and Metastore calls?

@wendigo
Copy link
Contributor

wendigo commented Sep 18, 2024

@sriharshaj You can enable tracing (opentelemetry) and capture what cluster is doing

@sriharshaj
Copy link
Author

@wendigo Can I capture those metrics with JMX? We don't have opentelemetry setup.

@wendigo
Copy link
Contributor

wendigo commented Sep 19, 2024

@sriharshaj jmx keeps aggregates, not individual events

@sriharshaj
Copy link
Author

@wendigo I installed Trino locally and ran the same query. The planning phase took approximately 15 seconds.

The ConnectorMetadata.getTableProperties method is taking around 1.5 to 2.0 seconds to retrieve the table metadata.

During query optimization in Trino, metadata is being fetched five times, and during the fragment generation phase, it’s being retrieved three additional times.

@sriharshaj
Copy link
Author

This issue occurs exclusively with Iceberg queries. For Hive, everything works as expected.

@wendigo
Copy link
Contributor

wendigo commented Sep 19, 2024

@sriharshaj I believe that @raunaqmorarka added recently some cache for metadata files.

@wendigo
Copy link
Contributor

wendigo commented Sep 19, 2024

What's the version you are using? @sriharshaj

@sriharshaj
Copy link
Author

@wendigo We are facing this issue since 451.

I traced down the issue to this change: https://github.com/trinodb/trino/pull/15712/files#diff-e1cb17efec6787989f9df9ee40c4f2809ff3fe946cd2ec721ff8932b131997b8R618.

Our schema contains a large number of nested fields, which results in all columns being mapped to IcebergColumnHandle. When I debugged a specific table, it was mapping approximately 2,260 nested columns to IcebergColumnHandle, which can be significantly impacting the planning.

@wendigo
Copy link
Contributor

wendigo commented Sep 19, 2024

@krvikash @raunaqmorarka can you take a look?

@sriharshaj sriharshaj changed the title Iceberg queries stuck in the planning phase for 2-3 minutes Iceberg queries stuck in the planning phase for long time Sep 19, 2024
@sriharshaj
Copy link
Author

Thank you, @wendigo, for your guidance in helping me identify the issue.

@sriharshaj
Copy link
Author

@krvikash @raunaqmorarka Any updates?

@krvikash
Copy link
Contributor

Hi @sriharshaj, I did not get a chance to looked into yet. I will take a look in this week.

@krvikash
Copy link
Contributor

Hi @sriharshaj, opened #23586 to load only required columns in the map. If you have a way to try this change before merge and see if this fix reduce the planning time for your query.

@sriharshaj
Copy link
Author

@krvikash Sure, I will try this fix today.

@sriharshaj
Copy link
Author

@krvikash I tried the fix, it worked. Planning time is in milliseconds. Thank you for the fix.

@krvikash
Copy link
Contributor

Great!! Thanks @sriharshaj for reporting this and verifying the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants