Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path to registry index is not cross-platform stable (i.e. hash part of registry/index/index.crates.io-<hash>) #14795

Open
uncomfyhalomacro opened this issue Nov 8, 2024 · 21 comments
Labels
C-bug Category: bug Command-fetch disposition-merge FCP with intent to merge proposed-final-comment-period An FCP proposal has started, but not yet signed off. S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. T-cargo Team: Cargo

Comments

@uncomfyhalomacro
Copy link

uncomfyhalomacro commented Nov 8, 2024

Hello. I have an issue with cargo-fetch behaving differently than cargo-vendor. In a part of the help section of cargo fetch, it says for the flag --target

Fetch for the given architecture. The default is all architectures. The
general format of the triple is <arch><sub>-<vendor>-<sys>-<abi>.
Run rustc --print target-list for a list of supported targets. This flag may
be specified multiple times.

However, I believe this is untrue since in some architectures, they cannot find their dependencies. I haven't tried the --target flag yet as a workaround but given that some target architectures cannot find their dependencies is weird to me, since it says that the default is to use "all" target architectures.

Here is a comparison between cargo vendor vs cargo fetch.

You can see here that in https://build.opensuse.org/package/show/home:uncomfyhalomacro:branches:editors:registry/kak-lsp, only aarch64, x86_64, and ppc64le are able to find their dependencies and compile (whether it fails or not). i586, armv7l, and s390x are the architectures that were not able to find their dependencies.

Everything in https://build.opensuse.org/package/show/home:uncomfyhalomacro:branches:editors:vendor/kak-lsp are able to find their dependencies and are able to compile (whether it fails or not).

I would like to know if there is an inaccuracy with the documentation or not.

For context, I've been helping out to build a tool to ease the process of vendoring dependencies in openSUSE/SUSE. https://github.com/openSUSE-Rust/obs-service-cargo.

@epage epage added Command-fetch S-triage Status: This issue is waiting on initial triage. C-bug Category: bug labels Nov 8, 2024
@weihanglo
Copy link
Member

Thanks for the bug report. Could you attach some additional information, like

  • The version of Cargo (the output of cargo -vV) was observed with this behavior.
  • If this is a regression, the last successful version of Cargo.
  • The exact Cargo commands or scripts ran in that service.
  • Any other [source] replacement or proxies configured in the server or not.

cargo vendor and cargo fetch has no difference in the sense of dependency resolution (except cargo vendor accepts multiple workspaces). However, there are slightly different when filtering platforms:

cargo fetch fetches only packages in the dependency resolution (also seen as packages recorded in Cargo.lock) that could be reached from workspace members.

while let Some(id) = deps_to_fetch.pop() {
if !fetched_packages.insert(id) {
continue;
}
to_download.push(id);
let deps = resolve
.deps(id)
.filter(|&(_id, deps)| {
deps.iter().any(|d| {
// If no target was specified then all dependencies are
// fetched.
if options.targets.is_empty() {
return true;
}
// Otherwise we only download this dependency if any of the
// requested platforms would match this dependency. Note
// that this is a bit lossy because not all dependencies are
// always compiled for all platforms, but it should be
// "close enough" for now.
build_config
.requested_kinds
.iter()
.any(|kind| data.dep_platform_activated(d, *kind))
})
})
.map(|(id, _deps)| id);
deps_to_fetch.extend(deps);
}

cargo vendor, in contrast, fetches everything in the dependency resolution. And there is no way for cargo vendor to filter packages by platforms at this moment.

.get_many(resolve.iter())

Bugs might be there if they exist.

@weihanglo weihanglo added S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. and removed S-triage Status: This issue is waiting on initial triage. labels Nov 8, 2024
@uncomfyhalomacro
Copy link
Author

uncomfyhalomacro commented Nov 9, 2024

Hi @weihanglo . This the version of cargo running on my container installed through zypper.

cargo 1.82.0 (8f40fc59f 2024-08-21)
release: 1.82.0
commit-hash: 8f40fc59fb0c8df91c97405785197f3c630304ea
commit-date: 2024-08-21
host: x86_64-unknown-linux-gnu
libgit2: 1.8.1 (sys:0.19.0 vendored)
libcurl: 8.10.1 (sys:0.4.74+curl-8.9.0 system ssl:OpenSSL/3.2.3)
os: openSUSE 20241106.0.0 [64-bit]

The commands ran in that service is as follows.

  1. cargo-fetch with flags --manifest-path. If my service is set with respect-lockfile to true, a --locked flag will be added.
  2. cargo-vendor with flags --manifest-path. Additional manifest paths will be passed with --sync. If my service is set with respect-lockfile to true, a --locked flag will be added.

If this is a regression, the last successful version of Cargo.

I wouldn't know of a regression as this is the first time that I noticed this and my first bug report about it.

Any other [source] replacement or proxies configured in the server or not.

None. I just use whatever is the default.

cargo fetch fetches only packages in the dependency resolution (also seen as packages recorded in Cargo.lock) that could be reached from workspace members.

So shouldn't the help section of cargo-fetch be updated?

If --target is not specified, then all target dependencies are fetched

This seems to confuse me more as this was the only assumption I have from the help section but the behaviour for what I expected was different than what was described. Should fetch at least have the same behaviour as vendor that they both fetch everything in the dependency resolution by default?

@uncomfyhalomacro
Copy link
Author

Hi. I tried using the --target flag in this commit openSUSE-Rust/obs-service-cargo@a34992c for my project but it seems it's still not budging in https://build.opensuse.org/package/show/home:uncomfyhalomacro:branches:editors:registry/kak-lsp

@weihanglo
Copy link
Member

So shouldn't the help section of cargo-fetch be updated?

Either this or make the behavior of both commands aligned. Let me try to get a minimal reproduction first.

@weihanglo
Copy link
Member

Let me try to get a minimal reproduction first.

I've re-read cargo-fetch logic. While it may fail to fetch packages that have unstable features like forced-tarrget or artifact depenedencies, for normal circumstance it should just work.

It is surprising that in your build log it's crossbeam-channel missing

[   66s] + /usr/bin/cargo auditable build -j8 --offline --release --all-features
[   66s] error: no matching package named `crossbeam-channel` found
[   66s] location searched: registry `crates-io`
[   66s] required by package `kak-lsp v18.0.3 (/home/abuild/rpmbuild/BUILD/kakoune-lsp-18.0.3)`
[   66s] As a reminder, you're using offline mode (--offline) which can sometimes cause surprising resolution failures, if this error is too confusing you may wish to retry without the offline flag.
[   66s] error: Bad exit status from /var/tmp/rpm-tmp.8OUc67 (%build)

If I read it correctly, crossbeam-channel is not even a platform-specific dependency. I wonder what is inside the registry.tar.zst tarball. Maybe it is corrupted or missing something when packing?

@uncomfyhalomacro
Copy link
Author

It's not corrupted or missing since doing the following

raw -t registry.tar.zst -d registry/
cd registry/.cargo/registry/cache/index.crates.io-6f17d22bba15001f
fd crossbeam

gives me this output

crossbeam-channel-0.5.13.crate
crossbeam-utils-0.8.20.crate

The other variant which uses the vendor method produces a file vendor.tar.zst

Extracting it and running fd crossbeam to it will give me this output

crossbeam-channel-0.5.13/
crossbeam-channel-0.5.13/benches/crossbeam.rs
crossbeam-utils-0.8.20/

So I am not really sure how cargo-fetch and cargo-vendor is any different when they both have crossbeam.

@weihanglo
Copy link
Member

registry/.cargo/registry/index/index.crates.io-6f17d22bba15001f

Do we have the correct index files under this location? Perhaps you can upload a tarball that failed to build and we can investigate together.

@uncomfyhalomacro
Copy link
Author

uncomfyhalomacro commented Nov 23, 2024

registry/.cargo/registry/index/index.crates.io-6f17d22bba15001f

Do we have the correct index files under this location? Perhaps you can upload a tarball that failed to build and we can investigate together.

Hi. It seems I am not able to upload a tarball. One thing you can do to perform a reproduction is by doing the following

cargo install --git https://github.com/openSUSE-Rust/obs-service-cargo
git clone --depth 1 --branch v18.0.3 https://github.com/kakoune-lsp/kakoune-lsp/
cd kakoune-lsp
cargo_vendor --src $PWD --outdir /tmp
cd /tmp/
mkdir vendored-sources/
tar xf vendor.tar.zst -C vendored-sources/
cd vendored-sources/

by now you can check the inside. It should contain a Cargo.lock and the .cargo directory which contains the registry stuff.

@weihanglo
Copy link
Member

weihanglo commented Nov 24, 2024

@uncomfyhalomacro the current obs-service-cargo openSUSE-Rust/obs-service-cargo@bde0300 generates only vendored source. There is no registry index file or tarball at all. I am not sure which commit I should use to reproduce.

BTW, you could also run fd -hidden crossbeam under registry/index/index.crates.io-6f17d22bba15001f/.cache and check if there is any index file called crossbeam-channel. If no, there you need to pack it.

@uncomfyhalomacro
Copy link
Author

uncomfyhalomacro commented Nov 24, 2024

@uncomfyhalomacro the current obs-service-cargo openSUSE-Rust/obs-service-cargo@bde0300 generates only vendored source. There is no registry index file or tarball at all. I am not sure which commit I should use to reproduce.

BTW, you could also run fd -hidden crossbeam under registry/index/index.crates.io-6f17d22bba15001f/.cache and check if there is any index file called crossbeam-channel. If no, there you need to pack it.

Oh i forgot to add the flag --method registry in the command. my bad. so it should be

cargo_vendor --src $PWD --method registry --outdir /tmp

Just to keep you updated, doing your command does this,

cr/os/crossbeam-channel
cr/os/crossbeam-deque
cr/os/crossbeam-epoch
cr/os/crossbeam-queue
cr/os/crossbeam-utils

@weihanglo
Copy link
Member

I was so dumb! Just realized the hash of index.crates.io-<hash> is endian and bit-width dependent. i586, and armv7l is 32-bit arch, and s390x is big-endian. I have been poking around the hash algorithm but didn't realize earlier 😅. This is a PR using a cross-platform hash for SourceId, which should make the situation better this. It is not going to make the hash value permanently stable. Cargo may change the algorithm between versions, but if we adopt whatever from that PR, it should generally stable within a version for all platforms. Using cargo-vendor is a better long-term solution for your use case I believe.

Given that, I believe cargo fetch works correctly in this regard. I will tweak the issue title a bit and plan to close it if you have no further question.

@weihanglo weihanglo changed the title cargo fetch does not actually use all target architectures as the default. Path to registry index is not cross-platform stable (i.e. hash part of registry/index/index.crates.io-<hash>) Nov 24, 2024
@uncomfyhalomacro
Copy link
Author

Sure! So the plan now is to create

one single index cache for all platforms

not sure when that will happen but for now you can close this issue as that literally answered all my questions!

@weihanglo
Copy link
Member

Yes. As I said the stable path is best-effort. We don't guarantee the permanent path of it.

I'll propose to the team for a stable hash next week (if I remember)

Thanks for your understanding!

@weihanglo weihanglo closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2024
@weihanglo weihanglo added the T-cargo Team: Cargo label Nov 25, 2024
@weihanglo
Copy link
Member

weihanglo commented Nov 25, 2024

Going to reopen and use this for proposing using stable hash for source URLs.

tl;dr:

  • Propose to switch to cross-platform stable hash from platform-dependent stable hash for source URL
  • Propose to choose BLAKE3 as the hash algorithm of cross-platform stable hash

During the course of designing remap rules for -Ztrim-paths, we found that the source URL hash in Cargo is never platform-dependent. I propose we switch to a cross-platform hash algorithm for source URL hashes. It benefits debugging when sharing source and debug info between different platforms, especially for -Ztrim-paths in the future.

The remaining question is the hash algorithm.

In #14116 we tried out rustc-stable-hash to experiment cross-platform (i.e., endianness and bit-width independen) hash algorithms for stable SourceId hashes, including the default one SipHash and BLAKE3. The benchmarking shows that

  • The performance of stable SipHash is on par with or a bit faster than the original one, while BLAKE3 is 2-3x slower.
  • The actual time spent is around 1.5ms when using BLAKE3 on medium-size projects like cargo for 4000+ hashes on 10-core Apple M1 Pro.

Therefore, I favor and propose BLAKE3 as our stable hash algorithm for these reasons,

  • The real world performance regression is neglectable comparing to other parts in a cargo invocation.
  • BLAKE3 is a cryptographic hash function. While it is not a problem for crates.io itself, it is still possible for other registry or git repo to intentionally make collisions for source URL hashes. It's better if we take preventive course now than later.

@rfcbot fcp merge


Besides, I'd like to call out some area this stable hash proposal doesn't cover:

  • File paths on Windows have prefixes like C:\, D:\. When using path, local registry, or directory sources, these prefixes will be included in hash and lead to a different hash output. While I don't think it is a big issue, w had better keep in mind the “cross-platform” source URL hash is not truly cross-platform in this regard.

@weihanglo weihanglo reopened this Nov 25, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 25, 2024

Team member @weihanglo has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period An FCP proposal has started, but not yet signed off. disposition-merge FCP with intent to merge labels Nov 25, 2024
@epage
Copy link
Contributor

epage commented Nov 25, 2024

@weihanglo whats the cost if we change our mind on which hash we use in the future? The important part is its stable across platforms, right? And we're mainly signing off that the small performance hit for blake3 is fine?

@weihanglo
Copy link
Member

weihanglo commented Nov 25, 2024

@epage I've clarified it in the FCP. There are two things:

  • Switch from platform-dependent to cross-platform hash.
  • Use BLAKE3 for that cross-platform hash.

If we don't want BLAKE3 we can continue using SipHash with rustc-stable-hash, which is pretty much the same as the current one in every aspect including the non-crytograhic property, but is cross-platform stable.

The cost of changing hash algorithm is it thrashes will need to download extra files for global caches (index files, git/registry sources) for Rust toolchain versions using new hash algorithms. We want to avoid that unless really really needed.

@epage
Copy link
Contributor

epage commented Nov 25, 2024

The cost of changing hash algorithm is it thrashes global caches (index files, git/registry sources) between Rust toolchain versions when we switch. We want to avoid that unless really really needed.

Double checking, It doesn't thrash the cache (throw out on each switch, like RUSTFLAGS today for rlibs) but keeps the caches between the old and new version separate (like my PR for RUSTFLAGS for rlibs)? This would be a one-time extra download and extra space taken up until the user cleans it up or we stabilize GC. Not great but not the worst.

iirc there are other parts of the project that are seeking out crytograhically secure hashes, making this line up with those, right?

@weihanglo
Copy link
Member

[…] It doesn't thrash the cache (throw out on each switch, like RUSTFLAGS today for rlibs) but keeps the caches between the old and new version separate (like my PR for RUSTFLAGS for rlibs)? This would be a one-time extra download and extra space taken up until the user cleans it up or we stabilize GC. Not great but not the worst.

You're absolutely correct. Not trashing but extra downloads for newer versions.

iirc there are other parts of the project that are seeking out crytograhically secure hashes, making this line up with those, right?

Yes. A meeting is happening on Dec 6th rust-lang/compiler-team#774. It would be great if both rustc and cargo depend on the same cryptographically secure hash. We can wait for a conclusion from that meeting, though it is not a must. The scenarios are not the same.

@ehuss
Copy link
Contributor

ehuss commented Nov 26, 2024

Can you say more about where this hash shows up? IIUC, it is ~/.cargo/registry/*/$name-$hash? And that also ends up in every artifact hash? Where else?

Can you say more about the current situation? IIRC, it is using SipHash, correct? And that is not cross-platform for some reason?

@weihanglo
Copy link
Member

Sorry I should have copied them from #14116. Here they are:

StableHasher is used in several places. We should consider if there is a need for cryptographyic hash (see #13171 (comment)).

Can you say more about the current situation? IIRC, it is using SipHash, correct? And that is not cross-platform for some reason?

Cargo currently uses SipHash as its stable hash, but the hash output is not cross-platform stable like the one rustc-stable-hash provides a cross-platform provides.

This FCP proposes we use the cross-platform stable hash from rustc-stable-hash and swap the underlying hash algorithm to BLAKE3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug Command-fetch disposition-merge FCP with intent to merge proposed-final-comment-period An FCP proposal has started, but not yet signed off. S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. T-cargo Team: Cargo
Projects
Status: FCP merge
Development

No branches or pull requests

5 participants