Improve Performance for Retrieving Tables Metadata from Iceberg Catalog #23909
+26
−17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improve Performance for Retrieving Tables Metadata from Iceberg Catalog
Overview
This PR addresses a performance issue encountered when querying column metadata from the Iceberg catalog in Trino (Issue #23468). The primary concern is that requests to retrieve table metadata are executed sequentially for each table, which significantly impacts query performance when dealing with a large number of tables.
Description
The issue manifests when executing the following query on the Iceberg catalog:
In environments with a large number of tables, the query’s response time increases considerably due to the sequential execution of catalog requests.
Changes Made
To improve the performance, this PR introduces parallelism in the metadata retrieval process. The old code executed catalog requests sequentially for each table, which resulted in longer execution times. The updated code utilizes
CompletableFuture
to handle requests asynchronously, thereby reducing the overall execution time.The updated implementation leverages
CompletableFuture
to process tables concurrently, significantly reducing the latency in retrieving column metadata.Additional Context and Related Issues
Issue #23468
Test Setup & Observations:
I checked the code changes and get faster query in over x4 faster then without the proposed change
Performance Comparison
As you can see in this table we get around 70% reduce in time for retrieving the columns
Release Notes
(X) Release notes are required, with the following suggested text: