Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The flag experimental_repository_downloader_retries doesn't retry except in the case of truncated downloads #24530

Closed
lpingas opened this issue Nov 29, 2024 · 3 comments
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug untriaged

Comments

@lpingas
Copy link

lpingas commented Nov 29, 2024

Description of the bug:

We tried using experimental_repository_downloader_retries to mitigate the impact of intermittent connection resets but observed that, even though it is documented with "The maximum number of attempts to retry a download error", Bazel won't retry repository downloads in case it gets a connection reset in the middle of the download:

WARNING: Download from https://bazel-registry-mirror-proxy/repository/production-nonmodule-mirrors/cdn.azul.com/zulu/bin/zulu21.32.17-ca-jdk21.0.2-linux_x64.tar.gz failed: class java.net.SocketException Connection reset
INFO: Repository remotejdk21_linux instantiated at:
     /DEFAULT.WORKSPACE.SUFFIX:93:24: in <toplevel>     
     /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/rules_java_builtin/java/repositories.bzl:562:23: in rules_java_dependencies
     /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/rules_java_builtin/java/repositories.bzl:430:10: in remote_jdk21_repos
     /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/bazel_tools/tools/build_defs/repo/utils.bzl:268:18: in maybe
     /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/rules_java_builtin/toolchains/remote_java_repository.bzl:52:17: in remote_java_repository
Repository rule http_archive defined at:
     /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/bazel_tools/tools/build_defs/repo/http.bzl:382:31: in <toplevel>
ERROR: An error occurred during the fetch of repository 'remotejdk21_linux':
          Traceback (most recent call last):
     File "/root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/bazel_tools/tools/build_defs/repo/http.bzl", line 131, column 45, in _http_archive_impl
          download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://bazel-registry-mirror-proxy/repository/production-nonmodule-mirrors/cdn.azul.com/zulu/bin/zulu21.32.17-ca-jdk21.0.2-linux_x64.tar.gz] to /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/remotejdk21_linux/temp16350890825581379863/zulu21.32.17-ca-jdk21.0.2-linux_x64.tar.gz: Connection reset
ERROR: no such package '@@remotejdk21_linux//': java.io.IOException: Error downloading [https://bazel-registry-mirror-proxy/repository/production-nonmodule-mirrors/cdn.azul.com/zulu/bin/zulu21.32.17-ca-jdk21.0.2-linux_x64.tar.gz] to /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/remotejdk21_linux/temp16350890825581379863/zulu21.32.17-ca-jdk21.0.2-linux_x64.tar.gz: Connection reset
ERROR: /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/rules_java_builtin/toolchains/BUILD:314:27: @@rules_java_builtin//toolchains:remotejdk_21 depends on @@remotejdk21_linux//:jdk in repository @@remotejdk21_linux which failed to fetch. no such package '@@remotejdk21_linux//': java.io.IOException: Error downloading [https://bazel-registry-mirror-proxy/repository/production-nonmodule-mirrors/cdn.azul.com/zulu/bin/zulu21.32.17-ca-jdk21.0.2-linux_x64.tar.gz] to /root/.cache/bazel/_bazel_root/ad24e534da8f1355479f573e3936af71/external/remotejdk21_linux/temp16350890825581379863/zulu21.32.17-ca-jdk21.0.2-linux_x64.tar.gz: Connection reset

Looking at the code, we see that retries after the connection is established are exclusive for cases of ContentLengthMismatchException:

private boolean shouldRetryDownload(IOException e, int attempt) {
if (attempt >= retries) {
return false;
}
if (e instanceof ContentLengthMismatchException) {
return true;
}
for (var suppressed : e.getSuppressed()) {
if (suppressed instanceof ContentLengthMismatchException) {
return true;
}
}
return false;
}

While the flag was created to handle truncated downloads (#13957), due to its naming and docs, we have been using it with the expectation that retries would also mitigate intermittent download failures caused by connection resets and read timeouts.

It seems reasonable to also retry if the suppressed exception is a SocketException, but I could be missing something.

Which category does this issue belong to?

External Dependency

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Run Bazel without a repository cache downloading from a source which sends connection resets in the middle of the download. Perhaps iptable could be used to simulate that, but I haven't figured out a good setup for it.

Which operating system are you running Bazel on?

Rocky 9

What is the output of bazel info release?

release 7.4.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@github-actions github-actions bot added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Nov 29, 2024
copybara-service bot pushed a commit that referenced this issue Dec 10, 2024
…ception

Fix for #24530

--experimental_repository_downloader_retries will now retry on `SocketException` in addition to `ContentLengthMismatchException`

Closes #24608.

PiperOrigin-RevId: 704633572
Change-Id: Idd1fcbb768c9dabed596fe15d8ae9260ef3e895d
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Dec 17, 2024
…ception

Fix for bazelbuild#24530

--experimental_repository_downloader_retries will now retry on `SocketException` in addition to `ContentLengthMismatchException`

Closes bazelbuild#24608.

PiperOrigin-RevId: 704633572
Change-Id: Idd1fcbb768c9dabed596fe15d8ae9260ef3e895d
github-merge-queue bot pushed a commit that referenced this issue Dec 18, 2024
…SocketException (#24722)

Fix for #24530

--experimental_repository_downloader_retries will now retry on
`SocketException` in addition to `ContentLengthMismatchException`

Closes #24608.

PiperOrigin-RevId: 704633572
Change-Id: Idd1fcbb768c9dabed596fe15d8ae9260ef3e895d

Commit
459bb57

Co-authored-by: Pareesh Madan <[email protected]>
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 18, 2025
…ception

Fix for bazelbuild#24530

--experimental_repository_downloader_retries will now retry on `SocketException` in addition to `ContentLengthMismatchException`

Closes bazelbuild#24608.

PiperOrigin-RevId: 704633572
Change-Id: Idd1fcbb768c9dabed596fe15d8ae9260ef3e895d
@meteorcloudy
Copy link
Member

Should be fixed by #24608

github-merge-queue bot pushed a commit that referenced this issue Jan 20, 2025
…SocketException (#24969)

Fix for #24530

--experimental_repository_downloader_retries will now retry on
`SocketException` in addition to `ContentLengthMismatchException`

Closes #24608.

PiperOrigin-RevId: 704633572
Change-Id: Idd1fcbb768c9dabed596fe15d8ae9260ef3e895d

Commit
459bb57

Co-authored-by: Pareesh Madan <[email protected]>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.5.0 RC2. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=7.5.0rc2. Thanks!

@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 8.1.0 RC1. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=8.1.0rc1. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug untriaged
Projects
None yet
Development

No branches or pull requests

5 participants