-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issue with Retrieving Columns from Iceberg Catalog #23468
Comments
@shohamyamin perhaps #22739 could help in your case. It requres 454. cc @raunaqmorarka |
Do you know the answer to this question? |
Yes, its created specifically for coordinator |
The cache is being preformed before the filter columns apply for the specific user that has being logged in? Because each user has access to different columns Another question this cache is separated from the regular cache? |
I think its worth understanding why the traces from the Rest catalog do not show S3 operations. I am not very familiar with how Iceberg catalogs work (besides the very basic JDBC one), but from the traces I'm guessing Trino is delegating fetching of metadata to the Rest catalog - in that case there isn't much that can be done on the Trino side. Another observation is that in the Nessie trace the In any case, its probably a good idea to do the S3 reads in parallel. For your test, was Trino is a coordinator-only setup? Wondering is this would generate multiple splits, but I doubt that. IMHO parallel reads of metadata files in Trino would be a band-aid solution. If you want this to fast (sub 500ms) you likely want to delegate the a Rest Iceberg catalog that reads from a Redis cache or Postgres DB without ever going to S3. |
Yes, the Trino setup for this test is indeed coordinator-only. The s3 GET requests are not preformed by Trino? About that self-time that can't be affected also by that trino need to wait for the responses? |
@mosiac1 , it looks like this is the for loop performing the S3 get request synchronously: Edit:The relevant code is in the iceberg plugin: |
Hi,
We are experiencing a performance issue when retrieving column metadata from the Iceberg catalog using the following query:
In our environment this query takes approximately 30 seconds to return results with clean cache, and we have around 500 tables with 3.5k columns in total.
From our analysis of the logs on both the Iceberg catalog and Trino side, it appears that requests to the catalog might be executed sequentially for each table.
Setup Details:
We tried both catalogs:
link to slack discussion
To investigate the performance issue, I set up a Docker Compose environment with Trino 451 and the latest versions of the REST catalog, and the Nessie catalog (using in-memory configuration), along with MinIO for Iceberg data storage and Jaeger for tracing. This test was conducted on my laptop, which has 32GB of RAM and an Intel Core 9 processor.
I executed the following SQL query for both the REST and Nessie catalogs:
After each query, I restarted Trino to clear the cache.
Here are the results for 500 tables, each containing 10 columns (a total of 5,000 columns):
I suspect that the reason I observed these better performance results compared to my Kubernetes cluster is due to the local setup. Running MinIO locally allows for a closer proximity to Trino, reducing latency.
Is there any optimization or configuration change we could apply to improve the performance of this query? Additionally, any insights into the sequential request behavior would be helpful.
Thanks in advance for your assistance!
Here are some of the tracing(thanks @mosiac1 for the advice it is very useful ) that i got from open telemetry:
Timelines of both nessie and rest:
You can see in the nessie(for some reson it is more detailed than the rest) timeline that for each table it makes a get from the s3 and it do it one after the other.
Here is the tracing as table:
Here are the jsons of the tracing if someone wants to load them to jaeger:
rest_jaeger_columns_benchmark.json
nessie_jaeger_columns_benchmark.json.json
The text was updated successfully, but these errors were encountered: