Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve to_file_path() #1018

Merged
merged 6 commits into from
Feb 8, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions url/benches/parse_url.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,19 @@ fn punycode_rtl(bench: &mut Bencher) {
bench.iter(|| black_box(url).parse::<Url>().unwrap());
}

fn url_to_file_path(bench: &mut Bencher) {
let url = if cfg!(windows) {
"file:///C:/dir/next_dir/sub_sub_dir/testing/testing.json"
} else {
"file:///data/dir/next_dir/sub_sub_dir/testing/testing.json"
};
let url = url.parse::<Url>().unwrap();

bench.iter(|| {
black_box(url.to_file_path().unwrap());
});
}

benchmark_group!(
benches,
short,
Expand All @@ -95,5 +108,6 @@ benchmark_group!(
punycode_ltr,
unicode_rtl,
punycode_rtl,
url_to_file_path
);
benchmark_main!(benches);
49 changes: 36 additions & 13 deletions url/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2720,7 +2720,18 @@ impl Url {
_ => return Err(()),
};

return file_url_segments_to_pathbuf(host, segments);
let estimated_capacity = self.as_str().len()
- if cfg!(target_os = "redox") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a clamped subtract so that it doesn't wrap around in release when someone calls this with something other than file scheme URLs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think the logic needs to be a bit smarter here because urls could be like a:/path? I updated the code to at least be a fast path for file://-like urls and maybe in some weird cases like a:/path it might under allocate. Probalby not worth the effort to make it smarter than it is now.

// remove only // because it still has file:
2
} else if cfg!(windows) {
// remove file: - has posssible \\ for hostname
5
} else {
// remove file://
7
};
return file_url_segments_to_pathbuf(estimated_capacity, host, segments);
}
Err(())
}
Expand Down Expand Up @@ -3030,6 +3041,7 @@ fn path_to_file_url_segments_windows(
any(unix, target_os = "redox", target_os = "wasi", target_os = "hermit")
))]
fn file_url_segments_to_pathbuf(
estimated_capacity: usize,
host: Option<&str>,
segments: str::Split<'_, char>,
) -> Result<PathBuf, ()> {
Expand All @@ -3047,11 +3059,10 @@ fn file_url_segments_to_pathbuf(
return Err(());
}

let mut bytes = if cfg!(target_os = "redox") {
b"file:".to_vec()
} else {
Vec::new()
};
let mut bytes = Vec::with_capacity(estimated_capacity);
Copy link
Contributor Author

@dsherret dsherret Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before 1,000 iterations within an iteration:

test url_to_file_path ... bench: 202,125 ns/iter (+/- 12,207)

After:

test url_to_file_path ... bench: 127,257 ns/iter (+/- 4,782)

if cfg!(target_os = "redox") {
bytes.extend(b"file:");
}

for segment in segments {
bytes.push(b'/');
Expand Down Expand Up @@ -3083,22 +3094,26 @@ fn file_url_segments_to_pathbuf(

#[cfg(all(feature = "std", windows))]
fn file_url_segments_to_pathbuf(
estimated_capacity: usize,
host: Option<&str>,
segments: str::Split<char>,
) -> Result<PathBuf, ()> {
file_url_segments_to_pathbuf_windows(host, segments)
file_url_segments_to_pathbuf_windows(estimated_capacity, host, segments)
}

// Build this unconditionally to alleviate https://github.com/servo/rust-url/issues/102
#[cfg(feature = "std")]
#[cfg_attr(not(windows), allow(dead_code))]
fn file_url_segments_to_pathbuf_windows(
Copy link
Contributor Author

@dsherret dsherret Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before 1000 iterations within an iteration:

test url_to_file_path ... bench: 437,555 ns/iter (+/- 11,519)

After:

test url_to_file_path ... bench: 119,461 ns/iter (+/- 5,927)

estimated_capacity: usize,
host: Option<&str>,
mut segments: str::Split<'_, char>,
) -> Result<PathBuf, ()> {
use percent_encoding::percent_decode;
let mut string = if let Some(host) = host {
r"\\".to_owned() + host
use percent_encoding::percent_decode_str;
let mut string = String::with_capacity(estimated_capacity);
if let Some(host) = host {
string.push_str(r"\\");
string.push_str(host);
} else {
let first = segments.next().ok_or(())?;

Expand All @@ -3108,7 +3123,7 @@ fn file_url_segments_to_pathbuf_windows(
return Err(());
}

first.to_owned()
string.push_str(first);
}

4 => {
Expand All @@ -3120,7 +3135,8 @@ fn file_url_segments_to_pathbuf_windows(
return Err(());
}

first[0..1].to_owned() + ":"
string.push_str(&first[0..1]);
string.push(':');
}

_ => return Err(()),
Expand All @@ -3131,11 +3147,18 @@ fn file_url_segments_to_pathbuf_windows(
string.push('\\');

// Currently non-unicode windows paths cannot be represented
match String::from_utf8(percent_decode(segment.as_bytes()).collect()) {
match percent_decode_str(segment).decode_utf8() {
Ok(s) => string.push_str(&s),
Err(..) => return Err(()),
}
}
// ensure our estimated capacity was good
debug_assert!(
string.len() <= estimated_capacity,
"len: {}, capacity: {}",
string.len(),
estimated_capacity
);
let path = PathBuf::from(string);
debug_assert!(
path.is_absolute(),
Expand Down
Loading