Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: resume GC after a pathinuse error #11922

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

picnoir
Copy link
Member

@picnoir picnoir commented Nov 20, 2024

Motivation

invalidatePathChecked is throwing a PathInUse exception. This exception is not catched and fails the whole GC run.

Instead, I think we should log the error for the specific store path we're trying to delete, explaining we can't delete this path because it still has referrers. Once we're done with logging that, the GC run should continue to delete the dead store paths it can delete.

Context

I recently faced a bug that I assume is coming from the topoSortPaths function where the GC was trying to delete a path having some alive referrers. The GC run service was constantly failing with the error message

error: cannot delete path '/nix/store/r1lp9kxlrc6h7vrba90gm6i94s31xvvx-gnugrep-3.11' because it is in use by '/nix/store/911x30h15lbfg5fkkabzjhars2svbnaa-stdenv-linux'

I resolved this by manually deleting the faulty path referrers using nix-store --query --referrers and nix store delete. I sadly can't reproduce this bug. It seems to happen extremely infrequently.

This bug alone is not a massive deal due to its infrequent nature. However, the way it cascades is a serious issue for Nix builders. Because we're throwing an un-catched PathInUse exception, the full GC run fails. This prevents any automatic garbage collection of the nix store, and the machine disk is slowly but surely filling up. Up until the point where a manual intervention is required to fix the situation.

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

src/libstore/gc.cc Outdated Show resolved Hide resolved
Copy link
Member

@Mic92 Mic92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an error that did happen but we haven't fixed yet, I agree that we gracefully degrade gc in this case.

@edolstra
Copy link
Member

edolstra commented Nov 20, 2024

If this is an error that did happen but we haven't fixed yet

In that case there should at least be an GitHub issue so we can try to diagnose this (and a comment in the code to link to that issue).

@picnoir
Copy link
Member Author

picnoir commented Nov 20, 2024

Created an issue: #11923

First the motivation: I recently faced a bug that I assume is coming
from the topoSortPaths function where the GC was trying to delete a
path having some alive referrers. I resolved this by manually deleting
the faulty path referrers using nix-store --query --referrers. I sadly
did not manage to reproduce this bug.

This bug alone is not a big deal. However, this bug is
triggering a cascading failure: invalidatePathChecked is throwing a
PathInUse exception. This exception is not catched and fails the whole GC
run. From there, the machine (a builder machine) was unable to GC its
Nix store, which led to an almost full disk with no way to
automatically delete the dead Nix paths.

Instead, I think we should log the error for the specific store path
we're trying to delete, specifying we can't delete this path because
it still has referrers. Once we're done with logging that, the GC run
should continue to delete the dead store paths it can delete.
@Mic92 Mic92 enabled auto-merge November 20, 2024 14:56
@Mic92 Mic92 merged commit c13c606 into NixOS:master Nov 20, 2024
10 checks passed
@picnoir picnoir deleted the pic/catch-gc-exception branch November 20, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants