Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote IO should retry certain error codes #601

Open
TomAugspurger opened this issue Jan 28, 2025 · 1 comment · May be fixed by #603
Open

Remote IO should retry certain error codes #601

TomAugspurger opened this issue Jan 28, 2025 · 1 comment · May be fixed by #603

Comments

@TomAugspurger
Copy link

Storage services like S3 will return responses with an HTTP 503 error when the number of requests being served exceeds its capacity. We can hit this error when, e.g., reading many parquet files through cudf (quickly):

  File "parquet.pyx", line 324, in pylibcudf.io.parquet.read_parquet
RuntimeError: curl_easy_perform() error near /opt/conda/conda-bld/work/cpp/src/remote_handle.cpp:255(The requested URL returned error: 503)

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html as the default retry policies of boto3 and this article has some general background.

I should be able to take a look at this today. We'll want to retry 429 and 503 codes, and possibly the other 50x codes too.

@madsbk
Copy link
Member

madsbk commented Jan 29, 2025

I should be able to take a look at this today. We'll want to retry 429 and 503 codes, and possibly the other 50x codes too.\

Agree, it would be good retries! Maybe also warn the user if it happens a lot.

@TomAugspurger TomAugspurger linked a pull request Jan 29, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants