Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build artifacts are erroneously skipped for unusual directory structures #38

Open
nhaef opened this issue Feb 27, 2025 · 4 comments
Open

Comments

@nhaef
Copy link

nhaef commented Feb 27, 2025

While using putzen-rs, I noticed two edge-cases where the tool does not delete all build artifacts right away, so I have to re-run the tool a couple of times to get all files deleted. The expected behavior would be that all build artifacts are deleted in the very first run of putzen -y.

Scenario 1: Nested Projects

└── foo
    ├── bar
    │   ├── node_modules
    │   ├── package-lock.json
    │   └── package.json
    ├── node_modules
    ├── package-lock.json
    └── package.json

Running putzen -y against this folder structure will only delete foo/node_modules/. IMO, it should also delete foo/bar/node_modules/. The tool may skip further processing for foo/node_modules/ as this folder is deleted anyway, but it seems like there is a bug which causes foo/ to be skipped instead.

Scenario 2: Bulky Projects

└── baz
    ├── Cargo.lock
    ├── Cargo.toml
    ├── node_modules
    ├── package-lock.json
    ├── package.json
    └── target

Running putzen -y against this folder structure will only delete baz/target/, but it will skip baz/node_modules/ for reasons similar to those mentioned above.

@nhaef
Copy link
Author

nhaef commented Feb 27, 2025

@sassman I already had the chance to look into the code and I suspect that the issue is caused by this code. Using e.read_children_path = None; seems to cause all children of e (DirEntry that points to the parent of the to-be-deleted folder) to be ignored. Furthermore, the break on L91 stops processing more rules.

I would be open to contribute a fix, but I think a good fix would require some refactoring upfront. Would you be open for that? 👀

@sassman
Copy link
Owner

sassman commented Feb 27, 2025

I think you're right. This is the root cause of the issue, because the folder should actually not be skipped because it is not the one that is going to be removed. It's a sub folder of it that is going to be removed. This is what the rule checks.
So I think the solution is as simple as not skipping this particular folder only the one that is going to be removed.

Also, I think this would not require a refactoring.

@nhaef
Copy link
Author

nhaef commented Feb 27, 2025

So I think the solution is as simple as not skipping this particular folder only the one that is going to be removed.

Yes, that would indeed solve scenario 1, but implementing this will definitely require some kind of refactoring. Ignoring the to-be-removed child dir of our current e: &mut jwalk::DirEntry is only possible if we know the child DirEntry. However, PathToRemoveResolver::resolve_path_to_remove, the function which is responsible for processing a rule and returning the to-be-removed child dir, returns a Folder instead. Now, we could of course work around that fact, for example by trying to find the corresponding DirEntry for that Folder in the entries vec (even that would require restructuring the process_by closure passed to process_read_dir, as we can't borrow entries as mutable more than once). However, I don't think that we would be happy with the resulting implementation.

Additionally, this solution still wouldn't fix scenario 2. As of now, the jwalk::ClientState is modeled to be an Option<Folder>. Since the client state is attached to the parent dir of the to-be-removed folder, we actually need to store multiple folders for the scenario where multiple rules match on the parent folder. A simple solution could be to go with Option<Vec<Folder>> or Option<[Folder; 3]>. Again, I would argue that this solution feels rather hacky.

A clean solution to both problems could be to refactor PathToRemoveResolver::resolve_path_to_remove. As of now, this function takes a folder and checks if file_to_check exists in that folder. If it does, it will return that folder. If we would rather go with something like:

fn should_remove(&self, folder: impl AsRef<Path>) -> bool

Then we could check if the current folder should be removed, instead of checking if the current folder is a parent of something that should be removed. I hope my explanation does make sense, and thanks again for looking into this. :-)

@sassman
Copy link
Owner

sassman commented Feb 27, 2025

Sure it makes sense.
I'm happy to look into your proposal, I guess that small of a refactoring is indeed necessary.

As an alternative approach the function that does check could return the (sub)folder that was the one going to be removed. In the folder walk we could keep track of any returned folder as a list of deletion and check in every dive down if we got a folder that is on this delete list. If so that's the one we gonna skip then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants