Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize recursive ls #2083

Merged
merged 11 commits into from
Apr 22, 2021
Merged

Conversation

siebenHeaven
Copy link
Contributor

Series of patches for optimizing recursive ls

  • Remove redundant clones / copies
  • Cache metadata for a path before hand so that subsequent functions that require it can reuse cached data
  • Some more caching

Mostly resolves #2069

This will require an update to sync with latest de-reference related changes so marking as draft.

Brief results with the patch locally:

BEFORE:

anup@LAPTOP-29TC204U:~/oss/coreutils$ hyperfine --warmup 2   --show-output 'ls -al -R /home/anup/ > /dev/null' '/tmp/ls -al -R /home/anup/ > /dev/null'
Benchmark #1: ls -al -R /home/anup/ > /dev/null
 Time (mean ± σ):      1.396 s ±  0.128 s    [User: 594.1 ms, System: 800.3 ms]
 Range (min … max):    1.212 s …  1.643 s    10 runs

Benchmark #2: /tmp/ls -al -R /home/anup/ > /dev/null
 Time (mean ± σ):      8.250 s ±  0.447 s    [User: 6.642 s, System: 1.545 s]
 Range (min … max):    7.835 s …  9.071 s    10 runs

Summary
 'ls -al -R /home/anup/ > /dev/null' ran
   5.91 ± 0.63 times faster than '/tmp/ls -al -R /home/anup/ > /dev/null'

AFTER:

anup@LAPTOP-29TC204U:~/oss/coreutils$ hyperfine --warmup 2   --show-output 'ls -al -R /home/anup/ > /dev/null' '/home/anup/oss/coreutils/target/release/ls -al -R /home/anup/ > /dev/null'
Benchmark #1: ls -al -R /home/anup/ > /dev/null
  Time (mean ± σ):      1.610 s ±  0.584 s    [User: 724.3 ms, System: 883.9 ms]
  Range (min … max):    1.244 s …  3.108 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/anup/oss/coreutils/target/release/ls -al -R /home/anup/ > /dev/null
  Time (mean ± σ):      1.995 s ±  0.067 s    [User: 1.618 s, System: 0.372 s]
  Range (min … max):    1.909 s …  2.119 s    10 runs

Summary
  'ls -al -R /home/anup/ > /dev/null' ran
    1.24 ± 0.45 times faster than '/home/anup/oss/coreutils/target/release/ls -al -R /home/anup/ > /dev/null'

Copy link
Contributor

@sylvestre sylvestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start.
Some first comments (I know it is a draft)

Please also fix the conflicts

and please create a
src/uu/ls/BENCHMARKING.md

to document this like src/uu/sort/BENCHMARKING.md

src/uu/ls/src/ls.rs Show resolved Hide resolved
src/uu/ls/src/ls.rs Outdated Show resolved Hide resolved
@sylvestre
Copy link
Contributor

don't hesitate to add more tests too

- PathData will hold Path related metadata / strings that are required
frequently in subsequent functions
- All data is precomputed and cached and subsequent functions just
use cached data
- Cache filename and sort by filename instead of full path
- Cache uid->usr and gid->grp mappings
@siebenHeaven siebenHeaven force-pushed the optimize_recursive_ls branch from f059880 to b4af0d0 Compare April 18, 2021 11:34
@siebenHeaven
Copy link
Contributor Author

siebenHeaven commented Apr 18, 2021

@sylvestre thank you for the early review, I've addressed your comments:

don't hesitate to add more tests too

I've added a small test for error path related to width specifically (it was a low hanging fruit for error path that was not covered as per local code coverage report)
Other than that, this PR does not attempt to add any specific feature, and and I think most of the paths it introduces are covered (except few error paths, I'm thinking of revisiting those as a separate PR)

Please let me know if you think any specific cases I should try to cover

and please create a
src/uu/ls/BENCHMARKING.md

to document this like src/uu/sort/BENCHMARKING.md

Yes, that was a great idea - I have created one to start with.

Please also fix the conflicts

Conflicts are now fixed

Marking this as ready, please revisit this for review/merge.

Thanks!

@siebenHeaven siebenHeaven marked this pull request as ready for review April 18, 2021 11:59
@@ -16,6 +16,7 @@ extern crate uucore;
mod quoting_style;
mod version_cmp;

use cached::proc_macro::cached;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the warning :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the warning - though I see failure in a CI/CD build for MinRustV: https://github.com/uutils/coreutils/pull/2083/checks?check_run_id=2373934810

Any suggestions on how to handle that would be welcome.
Maybe importing the cached just for caching uid2usr and gid2grp can be avoided and we could get away with something simpler - but I did not want to complicate things too much and this seemed like a textbook case for drop-in use of cached

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MinRustV fails because cached depends on async-mutex which seems to be using a feature added in Rust 1.41 (our current MSRV is 1.40). Here is the PR for that feature: rust-lang/rust#64325

Error:    --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/async-mutex-1.4.0/src/lib.rs:292:31
    |
292 |     pub fn try_lock_arc(self: &Arc<Self>) -> Option<MutexGuardArc<T>> {
    |                               ^^^^^^^^^^
    |
    = note: for more information, see https://github.com/rust-lang/rust/issues/44874
    = help: consider changing to `self`, `&self`, `&mut self`, `self: Box<Self>`, `self: Rc<Self>`, `self: Arc<Self>`, or `self: Pin<P>` (where P is one of the previous types except `Self`)

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking forward to this PR and it does not disappoint! Very nice work! I think this also enables more optimizations in the future. I put some questions/suggestions below. I don't think these necessarily need to be changed in this PR, but I'm interested to hear whether you think they might be worthwhile.

Sort::Name => entries.sort_by_key(|k| k.to_string_lossy().to_lowercase()),
Sort::Version => entries.sort_by(|a, b| version_cmp::version_cmp(a, b)),
Sort::Name => entries.sort_by_key(|k| k.file_name.to_lowercase()),
Sort::Version => entries.sort_by(|k, j| version_cmp::version_cmp(&k.p_buf, &j.p_buf)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version_cmp is actually one of the worst offenders of unnecessary conversions (which I know because I wrote it). It converts its inputs to String via to_string_lossy and disregards the Path. So another easy win is to pass the lossy_string to version_cmp instead of the Path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I did not get to check the implementation for version_cmp - we should definitely do this as it is a low hanging fruit.
I'll do it as part of this PR if I get the chance.

/// Caching data here helps eliminate redundant syscalls to fetch same information
struct PathData {
// Result<MetaData> got from symlink_metadata() or metadata() based on config
md: std::io::Result<Metadata>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to return this Result on PathData::new? The fs::metadata call fails if the path does not exist or the user does not have permissions to read the metadata, in which case I think we want to log that and then not consider the file for any later calls. So the new method would look rougly like this:

impl PathData {
    fn new(p_buf: PathBuf, config: &Config, command_line: bool) -> std::io::Result<Self> {
        let md = get_metadata(&p_buf, config, command_line)?;
        ...
        Ok(Self{...})
    }
}

And then we don't have to check for errors from the metadata later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's another good idea.
Need to check where / when the error is expected to be logged, when we print the details of the file (after sorting) or if it is fine to log all errors upfront.

For metadata, I had another optimization in mind where if the config does not include any options that require metadata details, we could skip get_metadata call altogether (type for the md field could then become Option)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that's a neat idea! An easy way to implement this might be to make a method PathData::md() that retrieves the metadata on the first call and then stores it in the struct and reuses that value when it's called again.

match md {
Err(e) => {
let filename = get_file_name(i, strip);
let filename = get_file_name(&i.p_buf, strip);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this could just be

let filename = i.file_name;

or is it too hard to implement strip then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice suggestion 👍
Actually, I think all get_file_name could be updated to be something like (based what all current use-cases look like)

fn get_file_name(p: &PathData, strip: bool) -> String {
    if strip {
        p.file_name
    } else {
        p.lossy_string
    }
}

cached will be only used for unix currently as current use of
caching gid/uid mappings is only relevant on unix
@sylvestre
Copy link
Contributor

@tertsdiepraam I think you are the best reviewer for this. please let me know when you think we can merge it :)

Implement caching using HashMap and lazy_static
Rust 1.40 did not support map_or for result types
@siebenHeaven
Copy link
Contributor Author

@tertsdiepraam I've updated to remove the dependency on cached altogether and fixed other MSRV related issues.
I would prefer to merge this PR and defer other optimizations that we discussed to a follow-up PR if that sounds okay.

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! We can further optimize this in other PR's and it's definitely worth merging this and not do too much in this PR. All checks are green too, so this can be merged in my opinion. @sylvestre

Comment on lines +1392 to +1399
match uid_cache.get(&uid) {
Some(usr) => usr.clone(),
None => {
let usr = entries::uid2usr(uid).unwrap_or_else(|_| uid.to_string());
uid_cache.insert(uid, usr.clone());
usr
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this a bit with the HashMap::entry method:

Suggested change
match uid_cache.get(&uid) {
Some(usr) => usr.clone(),
None => {
let usr = entries::uid2usr(uid).unwrap_or_else(|_| uid.to_string());
uid_cache.insert(uid, usr.clone());
usr
}
}
uid_cache.entry(&uid).or_insert_with(||
entries::uid2usr(uid).unwrap_or_else(|_| uid.to_string())
).clone()

(haven't tested whether all the types and ownership work out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried something like this, but it was not caching for reasons I didnt get a chance to debug.

I will give it another shot in a follow-up, worth switching to this for the benifit of cleaner code.

Thanks!

@sylvestre sylvestre merged commit 8554cdf into uutils:master Apr 22, 2021
@sylvestre
Copy link
Contributor

well done :)

@sylvestre
Copy link
Contributor

Interesting, I have different results than you.

On firefox source code:

  'ls -R /home/sylvestre/dev/mozilla/mozilla-central.hg' ran
    5.28 ± 1.56 times faster than './target/release/coreutils ls -R /home/sylvestre/dev/mozilla/mozilla-central.hg'
    8.39 ± 1.26 times faster than '/tmp/coreutils ls -R /home/sylvestre/dev/mozilla/mozilla-central.hg'

it is an improvement but still much slower than ls :)

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Apr 22, 2021

@sylvestre I also tried it on my machine with difference combinations of -a and -l and it seems like -l slows GNU down:

'ls -al -R' ran
  2.05 ± 0.05 times faster than '../coreutils/target/release/coreutils ls -al -R'
'ls -R' ran
  5.03 ± 0.03 times faster than '../coreutils/target/release/coreutils ls -R'
'ls -a -R' ran
  5.25 ± 0.07 times faster than '../coreutils/target/release/coreutils ls -a -R'
'ls -l -R' ran
  2.18 ± 0.03 times faster than '../coreutils/target/release/coreutils ls -l -R'

(this is also on the Firefox source code)

@siebenHeaven
Copy link
Contributor Author

@sylvestre for difference between results, I imagine that could be because I'm trying this on wsl2 rather than bare linux and / or other filesystem / hardware differences (these kind of differences will affect what is the bottleneck and hence the overall results I believe)

I do get results along these lines (note this is on linux source tree - not sure of how that compares with firefox sources, but I've tried on larger directories too with more or less similar results)

  'ls -al -R' ran
    1.45 ± 0.04 times faster than 'coreutils ls -al -R'

  'ls -a -R' ran
    5.65 ± 0.49 times faster than 'coreutils ls -a -R'

  'ls -l -R' ran
    1.31 ± 0.22 times faster than 'coreutils ls -l -R'

  'ls -R' ran
    5.02 ± 0.35 times faster than 'coreutils ls -R'

There are some more optimization opportunities that this PR exposes - some are listed in comments from @tertsdiepraam 's review that I couldn't get to in this PR.

As for the difference with vs without -a / -l, I imagine that maybe because GNU skips fetching information that is not really required (eg. GNU ls will do no stat calls with just -R but the way things are working after this PR, we do it once, always for every path regardless of whether we will require it down the line or not)

We may be able to do the same following one of @tertsdiepraam 's suggestions of adding methods to PathData structure that will lazyily fetch information as opposed to the way this PR introduces of fetching all information upfront.

I am open to keeping the linked issue open and getting into some of these optimizations with another PR if that sounds okay to you.

@sylvestre
Copy link
Contributor

I am open to keeping the linked issue open and getting into some of these optimizations with another PR if that sounds okay to you.

Ok, thanks!

@siebenHeaven siebenHeaven deleted the optimize_recursive_ls branch April 25, 2021 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ls -R could run in parallel to improve the performances (aka ls -R is much slower than GNU's)
4 participants