-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
globwalk looks at all files even when it doesn't need to #29
Comments
I'm willing to work on this. |
Same here. In my case I'm using Tera, which uses globwalk to handle globs and I noticed that my application is practically unusable when running in a Docker container where the working directory is the root directory of the container because globwalk spends hours walking directories like /sys, I've never even let it finish so who knows how much longer it would have spent scanning my entire file system when it really should just find two html files in Luckily, the fix appears to be rather simple. The iterator does not handle the case where Match::None if is_dir => {
skip_dir = true;
continue 'skipper;
} Then globwalk seems to skip irrelevant directories as expected. |
@robinfriedli thank you for working on it! |
After a brief check, it sadly doesn't work exactly like that, and filters out correct files.
It looks like Can you supply an example where this worked for you? |
Yeah I jumped the gun a bit here. It is not quite as simple unfortunately as the directory being part of the pattern also results in a Match::None. Maybe it would be enough to check whether the directory is a prefix of the glob? |
We can construct a second glob for dirs that contains the prefixes of the original positive patterns. |
I thought of that but do we actually want to yield those directories? I guess we could keep those globs separate and only use them to decide on whether to skip. I'm not 100% sure either but creating a glob for each child path (using the path components) might work. For example, for the glob |
That does seem to work to some extent, at least all tests pass (except for the readme test) and it does seem to skip irrelevant directories. I can open a PR if you like. |
Gladly. I'll try to attack it with tests over the weekend to see if we missed something. Thanks again! |
No problem 😄Pull request opened |
Wow, I just ran into this exact same issue. It probably would never finish due to recursive symlinks. |
https://docs.rs/globwalk/0.8.0/src/globwalk/lib.rs.html#355
If the top-level directory is not a match, no subdirectories or files in the directory will be a match either. However
globwalk
will still look at them. If there are many files in the ignored directory, this can cause enormous slowdowns. In https://github.com/rust-lang/docs.rs/pull/861/files#diff-ee6431d852ce8913514eece9e3982d32R96-R99, we have several hundred thousand files in subdirectories, but only ~10 in the matched directory, so this causes a slowdown of several orders of magnitude.The text was updated successfully, but these errors were encountered: