CI failing with mounting volume error #7659

oxarbitrage · 2023-10-02T22:12:11Z

https://github.com/ZcashFoundation/zebra/actions/runs/6382508836/job/17321768540?pr=7653#step:13:188

docker: Error response from daemon: error while mounting volume '/var/lib/docker/volumes/fully-synced-rpc-85b855b/_data': failed to mount local volume: mount /dev/sdb:/var/lib/docker/volumes/fully-synced-rpc-85b855b/_data: device or resource busy.
Error: Process completed with exit code 125.

This seems to be happening for every PR we have opened recently. We can post more links to failures and more information in this ticket.

The text was updated successfully, but these errors were encountered:

oxarbitrage · 2023-10-02T22:13:28Z

It is not the first time in the CI history we see this error. One option is to find what we did at that time and try to repeat it or get information from it.

oxarbitrage · 2023-10-03T19:11:35Z

In a closer inspection, I've observed that the error is not present in every open pull request, but specifically in the one linked here.

The error is visible in the Zebra Tip JSON-RPC job, while none of the other open pull requests encounter this issue in the same job or with the specific error message.

All other pull requests pass this job and start failing in the lightwalletd tip update job with a 'The operation was canceled.' error.

So, we should pay attention only to what is happening in this specific pull request in this ticket:

It is using this image: zebrad-cache-7633-merge-7b222f7-v25-mainnet-tip-u-174603. You can find this information in the Find fully-synced-rpc cached state disk job.
The job is failing in Zebra Tip JSON-RPC, but it failed only once in this particular pull request and nowhere else.

I suggest deleting the possibly corrupted image from gcloud and restarting the CI for this pull request. Another image will be selected, if this was a one time issue, then the CI should pass the Zebra Tip JSON-RPC job.

This will help us determine if the problem was related to the image, and it may also lower the priority of the ticket.

@gustavovalverde , please let me know your thoughts on this. Any additional input is welcome.

teor2345 · 2023-10-03T20:38:19Z

Have we tried restarting CI without deleting any images?
It might have been a temporary issue on that Google Cloud machine, and we'll get a new machine when we restart.

teor2345 · 2023-10-03T20:39:10Z

(It is unlikely that a "device or resource busy" error would be caused by a specific image, because they are usually about open files or devices.)

oxarbitrage · 2023-10-03T20:54:53Z

Makes sense, restarting all jobs at https://github.com/ZcashFoundation/zebra/actions/runs/6382508836?pr=7653 for the PR with the issue.

teor2345 · 2023-10-03T20:56:25Z

a closer inspection, I've observed that the error is not present in every open pull request, but specifically in the one linked here.

The error is visible in the Zebra Tip JSON-RPC job, while none of the other open pull requests encounter this issue in the same job or with the specific error message.

Sorry about that, I thought I had checked multiple PRs, but I might have accidentally checked the same PR multiple times.

teor2345 · 2023-10-03T20:59:21Z

I'm also seeing this bug in PRs:

So maybe it is multiple PRs, but only after a recent change?

gustavovalverde · 2023-10-03T21:22:12Z

Yes...restarting works.

Failed here:

Worked here:

gustavovalverde · 2023-10-04T10:06:24Z

This PR did not work:

fix(ci): wait for disk to be mounted in VM #7662

teor2345 · 2023-10-04T21:38:05Z

We could re-run the entire docker command a limited number of times until it succeeds?

teor2345 · 2023-10-06T05:12:11Z

This wasn't completely fixed by PR #7686, but it's a lot better now.

Maybe we can drop it down from critical to high priority?

gustavovalverde · 2023-10-06T12:07:57Z

I’m waiting for the latest commit to run, but I was able to found the issue while deploying the instances manually, as running dmesg was outputting the following message when I tried to mount /dev/sbd in Docker:

/dev/sdb: Can't open blockdev

And this only happened after creating the Docker volume.

I added a new commit (which I’ve tested at least 3 times), and it’s not failing to mount: 398c2f1

But I’ll keep testing to confirm

teor2345 · 2023-10-08T22:23:36Z

This is still failing to mount in:
https://github.com/ZcashFoundation/zebra/actions/runs/6435775989/job/17478042314?pr=7636#step:13:194

teor2345 · 2023-10-08T23:57:45Z

Failed here as well:
https://github.com/ZcashFoundation/zebra/actions/runs/6450609361/job/17510268834?pr=7633#step:12:193

teor2345 · 2023-10-09T06:13:21Z

What if using block storage is part of our issue?
It's only recommended for experts in the docker docs:
https://docs.docker.com/storage/volumes/#block-storage-devices

Is there a way to let docker handle the devices automatically, without us having to initialise them?

oxarbitrage added A-devops Area: Pipelines, CI/CD and Dockerfiles P-Critical 🚑 labels Oct 2, 2023

mpguerra added this to Zebra Oct 2, 2023

oxarbitrage assigned gustavovalverde Oct 2, 2023

github-project-automation bot moved this to 🆕 New in Zebra Oct 2, 2023

oxarbitrage self-assigned this Oct 2, 2023

gustavovalverde mentioned this issue Oct 3, 2023

fix(ci): wait for disk to be mounted in VM #7662

Merged

6 tasks

teor2345 mentioned this issue Oct 4, 2023

change(state): Stop using iterators on column families with many deletions #7663

Merged

6 tasks

gustavovalverde closed this as completed in #7662 Oct 4, 2023

github-project-automation bot moved this from 🆕 New to ✅ Done in Zebra Oct 4, 2023

gustavovalverde mentioned this issue Oct 4, 2023

fix(ci): disk validation for docker volume mount #7665

Merged

4 tasks

gustavovalverde reopened this Oct 4, 2023

This was referenced Oct 5, 2023

main branch CI failed: workflow_dispatch in CI Docker #7651

Closed

fix(ci): Replace busybox with ubuntu to avoid "device or resource busy" failures #7686

Merged

teor2345 closed this as completed in #7686 Oct 6, 2023

teor2345 reopened this Oct 6, 2023

gustavovalverde closed this as completed in #7665 Oct 6, 2023

teor2345 reopened this Oct 8, 2023

gustavovalverde mentioned this issue Oct 8, 2023

fix(ci): handle disk mounting and logs reading edge-cases #7690

Merged

6 tasks

mergify bot closed this as completed in #7690 Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI failing with mounting volume error #7659

CI failing with mounting volume error #7659

oxarbitrage commented Oct 2, 2023 •

edited by teor2345

Loading

oxarbitrage commented Oct 2, 2023

oxarbitrage commented Oct 3, 2023 •

edited

Loading

teor2345 commented Oct 3, 2023

teor2345 commented Oct 3, 2023

oxarbitrage commented Oct 3, 2023

teor2345 commented Oct 3, 2023

teor2345 commented Oct 3, 2023

gustavovalverde commented Oct 3, 2023

gustavovalverde commented Oct 4, 2023

teor2345 commented Oct 4, 2023

teor2345 commented Oct 6, 2023

gustavovalverde commented Oct 6, 2023

teor2345 commented Oct 8, 2023

teor2345 commented Oct 8, 2023

teor2345 commented Oct 9, 2023

CI failing with mounting volume error #7659

CI failing with mounting volume error #7659

Comments

oxarbitrage commented Oct 2, 2023 • edited by teor2345 Loading

oxarbitrage commented Oct 2, 2023

oxarbitrage commented Oct 3, 2023 • edited Loading

teor2345 commented Oct 3, 2023

teor2345 commented Oct 3, 2023

oxarbitrage commented Oct 3, 2023

teor2345 commented Oct 3, 2023

teor2345 commented Oct 3, 2023

gustavovalverde commented Oct 3, 2023

gustavovalverde commented Oct 4, 2023

teor2345 commented Oct 4, 2023

teor2345 commented Oct 6, 2023

gustavovalverde commented Oct 6, 2023

teor2345 commented Oct 8, 2023

teor2345 commented Oct 8, 2023

teor2345 commented Oct 9, 2023

oxarbitrage commented Oct 2, 2023 •

edited by teor2345

Loading

oxarbitrage commented Oct 3, 2023 •

edited

Loading