Optimize recursive ls #2083

siebenHeaven · 2021-04-17T15:51:00Z

Series of patches for optimizing recursive ls

Remove redundant clones / copies
Cache metadata for a path before hand so that subsequent functions that require it can reuse cached data
Some more caching

Mostly resolves #2069

This will require an update to sync with latest de-reference related changes so marking as draft.

Brief results with the patch locally:

BEFORE:

anup@LAPTOP-29TC204U:~/oss/coreutils$ hyperfine --warmup 2   --show-output 'ls -al -R /home/anup/ > /dev/null' '/tmp/ls -al -R /home/anup/ > /dev/null'
Benchmark #1: ls -al -R /home/anup/ > /dev/null
 Time (mean ± σ):      1.396 s ±  0.128 s    [User: 594.1 ms, System: 800.3 ms]
 Range (min … max):    1.212 s …  1.643 s    10 runs

Benchmark #2: /tmp/ls -al -R /home/anup/ > /dev/null
 Time (mean ± σ):      8.250 s ±  0.447 s    [User: 6.642 s, System: 1.545 s]
 Range (min … max):    7.835 s …  9.071 s    10 runs

Summary
 'ls -al -R /home/anup/ > /dev/null' ran
   5.91 ± 0.63 times faster than '/tmp/ls -al -R /home/anup/ > /dev/null'

AFTER:

anup@LAPTOP-29TC204U:~/oss/coreutils$ hyperfine --warmup 2   --show-output 'ls -al -R /home/anup/ > /dev/null' '/home/anup/oss/coreutils/target/release/ls -al -R /home/anup/ > /dev/null'
Benchmark #1: ls -al -R /home/anup/ > /dev/null
  Time (mean ± σ):      1.610 s ±  0.584 s    [User: 724.3 ms, System: 883.9 ms]
  Range (min … max):    1.244 s …  3.108 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #2: /home/anup/oss/coreutils/target/release/ls -al -R /home/anup/ > /dev/null
  Time (mean ± σ):      1.995 s ±  0.067 s    [User: 1.618 s, System: 0.372 s]
  Range (min … max):    1.909 s …  2.119 s    10 runs

Summary
  'ls -al -R /home/anup/ > /dev/null' ran
    1.24 ± 0.45 times faster than '/home/anup/oss/coreutils/target/release/ls -al -R /home/anup/ > /dev/null'

sylvestre

Great start.
Some first comments (I know it is a draft)

Please also fix the conflicts

and please create a
src/uu/ls/BENCHMARKING.md

to document this like src/uu/sort/BENCHMARKING.md

src/uu/ls/src/ls.rs

sylvestre · 2021-04-17T17:17:22Z

don't hesitate to add more tests too

- PathData will hold Path related metadata / strings that are required frequently in subsequent functions - All data is precomputed and cached and subsequent functions just use cached data

- Cache filename and sort by filename instead of full path - Cache uid->usr and gid->grp mappings

siebenHeaven · 2021-04-18T11:59:17Z

@sylvestre thank you for the early review, I've addressed your comments:

don't hesitate to add more tests too

I've added a small test for error path related to width specifically (it was a low hanging fruit for error path that was not covered as per local code coverage report)
Other than that, this PR does not attempt to add any specific feature, and and I think most of the paths it introduces are covered (except few error paths, I'm thinking of revisiting those as a separate PR)

Please let me know if you think any specific cases I should try to cover

and please create a
src/uu/ls/BENCHMARKING.md

to document this like src/uu/sort/BENCHMARKING.md

Yes, that was a great idea - I have created one to start with.

Please also fix the conflicts

Conflicts are now fixed

Marking this as ready, please revisit this for review/merge.

Thanks!

sylvestre · 2021-04-18T20:37:43Z

src/uu/ls/src/ls.rs

@@ -16,6 +16,7 @@ extern crate uucore;
 mod quoting_style;
 mod version_cmp;

+use cached::proc_macro::cached;


please fix the warning :)

I've fixed the warning - though I see failure in a CI/CD build for MinRustV: https://github.com/uutils/coreutils/pull/2083/checks?check_run_id=2373934810

Any suggestions on how to handle that would be welcome.
Maybe importing the cached just for caching uid2usr and gid2grp can be avoided and we could get away with something simpler - but I did not want to complicate things too much and this seemed like a textbook case for drop-in use of cached

I think MinRustV fails because cached depends on async-mutex which seems to be using a feature added in Rust 1.41 (our current MSRV is 1.40). Here is the PR for that feature: rust-lang/rust#64325

Error: --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/async-mutex-1.4.0/src/lib.rs:292:31 | 292 | pub fn try_lock_arc(self: &Arc<Self>) -> Option<MutexGuardArc<T>> { | ^^^^^^^^^^ | = note: for more information, see https://github.com/rust-lang/rust/issues/44874 = help: consider changing to `self`, `&self`, `&mut self`, `self: Box<Self>`, `self: Rc<Self>`, `self: Arc<Self>`, or `self: Pin<P>` (where P is one of the previous types except `Self`)

tertsdiepraam

I was looking forward to this PR and it does not disappoint! Very nice work! I think this also enables more optimizations in the future. I put some questions/suggestions below. I don't think these necessarily need to be changed in this PR, but I'm interested to hear whether you think they might be worthwhile.

tertsdiepraam · 2021-04-19T12:00:12Z

src/uu/ls/src/ls.rs

-        Sort::Name => entries.sort_by_key(|k| k.to_string_lossy().to_lowercase()),
-        Sort::Version => entries.sort_by(|a, b| version_cmp::version_cmp(a, b)),
+        Sort::Name => entries.sort_by_key(|k| k.file_name.to_lowercase()),
+        Sort::Version => entries.sort_by(|k, j| version_cmp::version_cmp(&k.p_buf, &j.p_buf)),


version_cmp is actually one of the worst offenders of unnecessary conversions (which I know because I wrote it). It converts its inputs to String via to_string_lossy and disregards the Path. So another easy win is to pass the lossy_string to version_cmp instead of the Path.

Hmm - I did not get to check the implementation for version_cmp - we should definitely do this as it is a low hanging fruit.
I'll do it as part of this PR if I get the chance.

tertsdiepraam · 2021-04-19T12:30:47Z

src/uu/ls/src/ls.rs

+/// Caching data here helps eliminate redundant syscalls to fetch same information
+struct PathData {
+    // Result<MetaData> got from symlink_metadata() or metadata() based on config
+    md: std::io::Result<Metadata>,


Would it make sense to return this Result on PathData::new? The fs::metadata call fails if the path does not exist or the user does not have permissions to read the metadata, in which case I think we want to log that and then not consider the file for any later calls. So the new method would look rougly like this:

impl PathData { fn new(p_buf: PathBuf, config: &Config, command_line: bool) -> std::io::Result<Self> { let md = get_metadata(&p_buf, config, command_line)?; ... Ok(Self{...}) } }

And then we don't have to check for errors from the metadata later on.

Yes, that's another good idea.
Need to check where / when the error is expected to be logged, when we print the details of the file (after sorting) or if it is fine to log all errors upfront.

For metadata, I had another optimization in mind where if the config does not include any options that require metadata details, we could skip get_metadata call altogether (type for the md field could then become Option)

Ah yeah, that's a neat idea! An easy way to implement this might be to make a method PathData::md() that retrieves the metadata on the first call and then stores it in the struct and reuses that value when it's called again.

tertsdiepraam · 2021-04-19T12:39:56Z

src/uu/ls/src/ls.rs

            match md {
                Err(e) => {
-                    let filename = get_file_name(i, strip);
+                    let filename = get_file_name(&i.p_buf, strip);


Do you think this could just be

let filename = i.file_name;

or is it too hard to implement strip then?

Another nice suggestion 👍
Actually, I think all get_file_name could be updated to be something like (based what all current use-cases look like)

fn get_file_name(p: &PathData, strip: bool) -> String { if strip { p.file_name } else { p.lossy_string } }

cached will be only used for unix currently as current use of caching gid/uid mappings is only relevant on unix

sylvestre · 2021-04-19T20:08:11Z

@tertsdiepraam I think you are the best reviewer for this. please let me know when you think we can merge it :)

src/uu/ls/BENCHMARKING.md

Implement caching using HashMap and lazy_static

Rust 1.40 did not support map_or for result types

siebenHeaven · 2021-04-20T17:35:49Z

@tertsdiepraam I've updated to remove the dependency on cached altogether and fixed other MSRV related issues.
I would prefer to merge this PR and defer other optimizations that we discussed to a follow-up PR if that sounds okay.

tertsdiepraam

Agreed! We can further optimize this in other PR's and it's definitely worth merging this and not do too much in this PR. All checks are green too, so this can be merged in my opinion. @sylvestre

tertsdiepraam · 2021-04-20T19:29:33Z

src/uu/ls/src/ls.rs

+    match uid_cache.get(&uid) {
+        Some(usr) => usr.clone(),
+        None => {
+            let usr = entries::uid2usr(uid).unwrap_or_else(|_| uid.to_string());
+            uid_cache.insert(uid, usr.clone());
+            usr
+        }
+    }


You could simplify this a bit with the HashMap::entry method:

Suggested change

match uid_cache.get(&uid) {

Some(usr) => usr.clone(),

None => {

let usr = entries::uid2usr(uid).unwrap_or_else(|_| uid.to_string());

uid_cache.insert(uid, usr.clone());

usr

}

}

uid_cache.entry(&uid).or_insert_with(||

entries::uid2usr(uid).unwrap_or_else(|_| uid.to_string())

).clone()

(haven't tested whether all the types and ownership work out)

I had tried something like this, but it was not caching for reasons I didnt get a chance to debug.

I will give it another shot in a follow-up, worth switching to this for the benifit of cleaner code.

Thanks!

sylvestre · 2021-04-22T07:19:38Z

well done :)

sylvestre · 2021-04-22T07:33:36Z

Interesting, I have different results than you.

On firefox source code:

  'ls -R /home/sylvestre/dev/mozilla/mozilla-central.hg' ran
    5.28 ± 1.56 times faster than './target/release/coreutils ls -R /home/sylvestre/dev/mozilla/mozilla-central.hg'
    8.39 ± 1.26 times faster than '/tmp/coreutils ls -R /home/sylvestre/dev/mozilla/mozilla-central.hg'

it is an improvement but still much slower than ls :)

tertsdiepraam · 2021-04-22T12:14:28Z

@sylvestre I also tried it on my machine with difference combinations of -a and -l and it seems like -l slows GNU down:

'ls -al -R' ran
  2.05 ± 0.05 times faster than '../coreutils/target/release/coreutils ls -al -R'
'ls -R' ran
  5.03 ± 0.03 times faster than '../coreutils/target/release/coreutils ls -R'
'ls -a -R' ran
  5.25 ± 0.07 times faster than '../coreutils/target/release/coreutils ls -a -R'
'ls -l -R' ran
  2.18 ± 0.03 times faster than '../coreutils/target/release/coreutils ls -l -R'

(this is also on the Firefox source code)

siebenHeaven · 2021-04-22T14:28:20Z

@sylvestre for difference between results, I imagine that could be because I'm trying this on wsl2 rather than bare linux and / or other filesystem / hardware differences (these kind of differences will affect what is the bottleneck and hence the overall results I believe)

I do get results along these lines (note this is on linux source tree - not sure of how that compares with firefox sources, but I've tried on larger directories too with more or less similar results)

  'ls -al -R' ran
    1.45 ± 0.04 times faster than 'coreutils ls -al -R'

  'ls -a -R' ran
    5.65 ± 0.49 times faster than 'coreutils ls -a -R'

  'ls -l -R' ran
    1.31 ± 0.22 times faster than 'coreutils ls -l -R'

  'ls -R' ran
    5.02 ± 0.35 times faster than 'coreutils ls -R'

There are some more optimization opportunities that this PR exposes - some are listed in comments from @tertsdiepraam 's review that I couldn't get to in this PR.

As for the difference with vs without -a / -l, I imagine that maybe because GNU skips fetching information that is not really required (eg. GNU ls will do no stat calls with just -R but the way things are working after this PR, we do it once, always for every path regardless of whether we will require it down the line or not)

We may be able to do the same following one of @tertsdiepraam 's suggestions of adding methods to PathData structure that will lazyily fetch information as opposed to the way this PR introduces of fetching all information upfront.

I am open to keeping the linked issue open and getting into some of these optimizations with another PR if that sounds okay to you.

sylvestre · 2021-04-22T14:36:34Z

I am open to keeping the linked issue open and getting into some of these optimizations with another PR if that sounds okay to you.

Ok, thanks!

sylvestre requested changes Apr 17, 2021

View reviewed changes

src/uu/ls/src/ls.rs Show resolved Hide resolved

src/uu/ls/src/ls.rs Outdated Show resolved Hide resolved

siebenHeaven added 6 commits April 18, 2021 12:56

ls: Remove allocations by eliminating collect/clones

5850f78

ls: Introduce PathData structure

76cc7fb

- PathData will hold Path related metadata / strings that are required frequently in subsequent functions - All data is precomputed and cached and subsequent functions just use cached data

ls: Cache more data related to paths

af68ec8

- Cache filename and sort by filename instead of full path - Cache uid->usr and gid->grp mappings

ls: Add BENCHMARKING.md

a7bbef9

ls: Document PathData structure

f89cb6d

tests/ls: Add testcase for error paths with width option

b4af0d0

siebenHeaven force-pushed the optimize_recursive_ls branch from f059880 to b4af0d0 Compare April 18, 2021 11:34

siebenHeaven marked this pull request as ready for review April 18, 2021 11:59

sylvestre reviewed Apr 18, 2021

View reviewed changes

tertsdiepraam reviewed Apr 19, 2021

View reviewed changes

siebenHeaven added 2 commits April 19, 2021 19:25

ls: Fix unused import warning

5731dc4

cached will be only used for unix currently as current use of caching gid/uid mappings is only relevant on unix

ls: Suggest checking syscall count in BENCHMARKING.md

fc66b32

cbjadwani reviewed Apr 20, 2021

View reviewed changes

src/uu/ls/BENCHMARKING.md Outdated Show resolved Hide resolved

siebenHeaven added 3 commits April 20, 2021 22:16

ls: Remove mentions of sort in BENCHMARKING.md

34b4ae0

ls: Remove dependency on cached

9415f6c

Implement caching using HashMap and lazy_static

ls: Fix MSRV error related to map_or

42ecac3

Rust 1.40 did not support map_or for result types

tertsdiepraam approved these changes Apr 20, 2021

View reviewed changes

sylvestre merged commit 8554cdf into uutils:master Apr 22, 2021

siebenHeaven deleted the optimize_recursive_ls branch April 25, 2021 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize recursive ls #2083

Optimize recursive ls #2083

siebenHeaven commented Apr 17, 2021

sylvestre left a comment

sylvestre commented Apr 17, 2021

siebenHeaven commented Apr 18, 2021 •

edited

Loading

sylvestre Apr 18, 2021

siebenHeaven Apr 19, 2021

tertsdiepraam Apr 19, 2021

tertsdiepraam left a comment

tertsdiepraam Apr 19, 2021

siebenHeaven Apr 19, 2021

tertsdiepraam Apr 19, 2021

siebenHeaven Apr 19, 2021

tertsdiepraam Apr 19, 2021

tertsdiepraam Apr 19, 2021

siebenHeaven Apr 19, 2021

sylvestre commented Apr 19, 2021

siebenHeaven commented Apr 20, 2021

tertsdiepraam left a comment

tertsdiepraam Apr 20, 2021

siebenHeaven Apr 21, 2021

sylvestre commented Apr 22, 2021

sylvestre commented Apr 22, 2021

tertsdiepraam commented Apr 22, 2021 •

edited

Loading

siebenHeaven commented Apr 22, 2021

sylvestre commented Apr 22, 2021

Optimize recursive ls #2083

Optimize recursive ls #2083

Conversation

siebenHeaven commented Apr 17, 2021

sylvestre left a comment

Choose a reason for hiding this comment

sylvestre commented Apr 17, 2021

siebenHeaven commented Apr 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tertsdiepraam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylvestre commented Apr 19, 2021

siebenHeaven commented Apr 20, 2021

tertsdiepraam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylvestre commented Apr 22, 2021

sylvestre commented Apr 22, 2021

tertsdiepraam commented Apr 22, 2021 • edited Loading

siebenHeaven commented Apr 22, 2021

sylvestre commented Apr 22, 2021

siebenHeaven commented Apr 18, 2021 •

edited

Loading

tertsdiepraam commented Apr 22, 2021 •

edited

Loading