Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp: implement copying from streams #7061

Merged
merged 3 commits into from
Jan 13, 2025
Merged

Conversation

DaringCuteSeal
Copy link
Contributor

Description

Previously, the only stream that cp supports copying are FIFOs. Any other files will be handled using std::io::copy which does not handle unix "special" files, such as block devices and character specials. This PR addresses the previously missing support by using std::io::copy or splice syscall (under Linux) provided from the buf-copy module from uucore.

There are also several minor fixes introduced alongside:

  • The FIFO content copy test should now pass under all unix platforms (FreeBSD was blacklisted previously).
  • Copying from /dev/null now works no matter the file path. Previously, cp only checks if the argument passed is a string of "/dev/null".

Current Minor Issue

The copy_stream from uucore returns a UResult which means that the caller cannot know the kind of error it returns directly. However, cp's stream copy function does not directly return a UResult as of now (it returns a std::io::Error), making the module plug slightly wonky. In this PR, I tried to map the UResult Error into a std::io::Error with the Other kind which doesn't sound right. A better fix can be done in a separate refactor, please let me know if I can (and should) work on that.

Copy link

github-actions bot commented Jan 3, 2025

GNU testsuite comparison:

GNU test failed: tests/cp/existing-perm-race. tests/cp/existing-perm-race is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

Copy link

github-actions bot commented Jan 3, 2025

GNU testsuite comparison:

GNU test failed: tests/cp/existing-perm-race. tests/cp/existing-perm-race is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@DaringCuteSeal DaringCuteSeal marked this pull request as draft January 4, 2025 00:54
@DaringCuteSeal DaringCuteSeal force-pushed the cp-stream-2 branch 4 times, most recently from 9d4bc72 to ac614da Compare January 4, 2025 01:48
Copy link

github-actions bot commented Jan 4, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@DaringCuteSeal DaringCuteSeal marked this pull request as ready for review January 4, 2025 03:51
Copy link

github-actions bot commented Jan 5, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@@ -3437,15 +3437,9 @@ fn test_same_file_force_backup() {
}

/// Test for copying the contents of a FIFO as opposed to the FIFO object itself.
#[cfg(all(unix, not(target_os = "freebsd"), not(target_os = "openbsd")))]
#[cfg(unix)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kudos :)

@@ -1953,6 +1953,7 @@ fn print_paths(parents: bool, source: &Path, dest: &Path) {
///
/// * `Ok(())` - The file was copied successfully.
/// * `Err(CopyError)` - An error occurred while copying the file.
#[allow(clippy::too_many_arguments)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should probably be refactor (this is fine in a separate PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nah. I added this for the sake of passing the style check since I added a new argument below there and it triggered a clippy warning. I don't think I can separate it :P

or should I?

Copy link

github-actions bot commented Jan 6, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

Copy link

github-actions bot commented Jan 6, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/misc/echo is no longer failing!

@sylvestre
Copy link
Contributor

did you run the benchmarks to make sure it didn't regress ?

@DaringCuteSeal
Copy link
Contributor Author

did you run the benchmarks to make sure it didn't regress ?

Oh, no, wait

you mean regression for the non-special files to make sure I didn't mess that up right?

@DaringCuteSeal
Copy link
Contributor Author

DaringCuteSeal commented Jan 6, 2025

for some reason my modification actually made it slightly faster in these runs but it's likely to also be influenced by my hard drive and other random factors

both are compiled with cargo build --release

With multiple files (50 runs, 3x warm-up)

Command Mean [s] Min [s] Max [s] Relative
./cp-old --update=all -rg test-dir/test-file test-dir/libasprintf.so .... 15.088 ± 15.233 1.299 39.352 1.00
Command Mean [s] Min [s] Max [s] Relative
./cp-new --update=all -rg test-dir/test-file test-dir/libasprintf.so .... 13.461 ± 15.205 1.270 42.127 1.00

With single file (50 runs, 3x warm-up)

Command Mean [s] Min [s] Max [s] Relative
./cp-old --update=all -rg test-dir/test-file test-dir-2 11.630 ± 12.990 1.114 38.733 1.00
Command Mean [s] Min [s] Max [s] Relative
./cp-new --update=all -rg test-dir/test-file test-dir-2 10.187 ± 12.802 1.105 38.805 1.00

Previously, the only stream that cp supports copying are FIFOs. Any other files will be handled using
std::io::copy which does not handle special files like block devices or character specials. This
commit fixes the previously missing support by using std::io::copy or splice syscall for Linux to
handle those cases.
Copying FIFOs now work under FreeBSD and other UNIX platforms.
@DaringCuteSeal DaringCuteSeal force-pushed the cp-stream-2 branch 3 times, most recently from 908e155 to 6402672 Compare January 6, 2025 13:01
@sylvestre
Copy link
Contributor

Could you please just export the hyperfine --export-markdown output ?
with both commands (old and new)

@DaringCuteSeal
Copy link
Contributor Author

Could you please just export the hyperfine --export-markdown output ?
with both commands (old and new)

sure

@DaringCuteSeal
Copy link
Contributor Author

old.md
new.md
old-2.md
new-2.md

@sylvestre
Copy link
Contributor

sylvestre commented Jan 7, 2025

@DaringCuteSeal not what i meant, sorry
at least, please run
hyperfine <old command> <new command>
will be good enough :)

the goal is to have an easy way to compare results

@DaringCuteSeal
Copy link
Contributor Author

DaringCuteSeal commented Jan 7, 2025

@DaringCuteSeal not what i meant, sorry at least, please run hyperfine will be good enough :)

the goal is to have an easy way to compare results

wait ha you mean just run hyperfine without any other parameters? ?????

im sure the reports above are from hyperfine

@sylvestre
Copy link
Contributor

sorry, github eat my message:
hyperfine <old command> <new command>
will be better

@DaringCuteSeal
Copy link
Contributor Author

DaringCuteSeal commented Jan 8, 2025

sorry, github eat my message: hyperfine <old command> <new command> will be better

ohhh yeah i did see that message but after i re-read everything i can't find that anymore

@sylvestre
Copy link
Contributor

@DaringCuteSeal btw, do you know why we don't use splice in this case #7092 ?

@DaringCuteSeal
Copy link
Contributor Author

@DaringCuteSeal btw, do you know why we don't use splice in this case #7092 ?

splice only works for pipes sources and not regular files

for regular files we use fs::copy:

On Linux (including Android), this function attempts to use copy_file_range(2), and falls back to reading and writing if that is not possible.
On Windows, this function currently corresponds to CopyFileEx. Alternate NTFS streams are copied but only the size of the main stream is returned by this function.
On MacOS, this function corresponds to fclonefileat and fcopyfile.
(Rust's documentation)

which seems to already use all optimizations possible

with that issue in particular it's probably something wrong with our loops, considering the amount of files the user reported copying.

@DaringCuteSeal
Copy link
Contributor Author

DaringCuteSeal commented Jan 8, 2025

@sylvestre for the benchmark btw

Benchmark 1: bash -c '/home/mynameisasecret/cp-old --update=all -rg /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'

(stripped out the progress bars)

  Range (min … max):    1.441 s … 40.123 s    10 runs
 
Benchmark 2: bash -c '/home/mynameisasecret/cp-new --update=all -rg /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'

(stripped out the progress bars)

  Range (min … max):    1.470 s … 39.254 s    10 runs
 
Summary
  bash -c '/home/mynameisasecret/cp-old --update=all -rg /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2' ran
    1.20 ± 1.57 times faster than bash -c '/home/mynameisasecret/cp-new --update=all -rg /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'

@sylvestre
Copy link
Contributor

it regressed the performances ?! surprising

@DaringCuteSeal
Copy link
Contributor Author

actually yeah thats weird i wonder if the order of operation matters

or is it the is_* calls to check the file type :\

@DaringCuteSeal
Copy link
Contributor Author

DaringCuteSeal commented Jan 8, 2025

wait no it actually did regress it :|

> hyperfine --warmup 3 --show-output "bash -c '/home/mynameisasecret/cp-new --update=all -r /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'" "bash -c '/home/mynameisasecret/cp-old --update=all -r /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'"


Benchmark 1: bash -c '/home/mynameisasecret/cp-new --update=all -r /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'
  Time (mean ± σ):     11.567 s ± 12.288 s    [User: 0.006 s, System: 1.353 s]
  Range (min … max):    1.679 s … 38.904 s    10 runs
 
Benchmark 2: bash -c '/home/mynameisasecret/cp-old --update=all -r /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'
  Time (mean ± σ):     11.167 s ± 12.337 s    [User: 0.007 s, System: 1.262 s]
  Range (min … max):    1.479 s … 38.678 s    10 runs
 
Summary
  bash -c '/home/mynameisasecret/cp-old --update=all -r /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2' ran
    1.04 ± 1.59 times faster than bash -c '/home/mynameisasecret/cp-new --update=all -r /home/mynameisasecret/test-dir/ /home/mynameisasecret/test-dir-2'

waddaya say then @sylvestre

@sylvestre
Copy link
Contributor

sorry but what means waddaya ?

@DaringCuteSeal
Copy link
Contributor Author

sorry but what means waddaya ?

what do you*

@sylvestre
Copy link
Contributor

maybe run samply to investigate the difference in term of execution

@DaringCuteSeal
Copy link
Contributor Author

sure thing

@DaringCuteSeal
Copy link
Contributor Author

hey @sylvestre i tried profiling the cps with samply a few times but results are pretty inconsistent. but overall, they seem to match up.

also i tried running more hyperfine with syncs and dropping cache beforehand:

> hyperfine --min-runs 40 --warmup 3 "./cp-old test-dir/test-file test-dir-2" "./cp-new test-dir/test-file test-dir-2"
Benchmark 1: ./cp-old test-dir/test-file test-dir-2
  Time (mean ± σ):      9.998 s ± 13.505 s    [User: 0.001 s, System: 1.036 s]
  Range (min … max):    1.337 s … 38.963 s    40 runs
 
Benchmark 2: ./cp-new test-dir/test-file test-dir-2
  Time (mean ± σ):      9.384 s ± 12.332 s    [User: 0.001 s, System: 1.026 s]
  Range (min … max):    1.342 s … 38.751 s    40 runs
 
Summary
  ./cp-new test-dir/test-file test-dir-2 ran
    1.07 ± 2.01 times faster than ./cp-old test-dir/test-file test-dir-2
❯ hyperfine --min-runs=10 --max-runs=10 --warmup 3 "./cp-old -r --update=all test-dir test-dir-2; sync" "./cp-new -r --update=all test-dir test-dir-2; sync"
Benchmark 1: ./cp-old -r --update=all test-dir test-dir-2; sync
 ⠴ Initial time measurement       ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:00   Time (mean ± σ):     17.002 s ± 14.712 s    [User: 0.018 s, System: 1.243 s]
  Range (min … max):    1.601 s … 38.939 s    10 runs
 
Benchmark 2: ./cp-new -r --update=all test-dir test-dir-2; sync
  Time (mean ± σ):     16.250 s ± 15.721 s    [User: 0.017 s, System: 1.256 s]
  Range (min … max):    2.143 s … 38.970 s    10 runs
 
Summary
  ./cp-new -r --update=all test-dir test-dir-2; sync ran
    1.05 ± 1.36 times faster than ./cp-old -r --update=all test-dir test-dir-2; sync

~1.05 difference ratio doesn't sound too bad for me.

@sylvestre
Copy link
Contributor

yeah, 1.05 is probably noise

@sylvestre sylvestre merged commit 74d80ea into uutils:main Jan 13, 2025
65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants