Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support flat repos [OLD, see #90] #86

Closed

Conversation

jjmaestro
Copy link
Contributor

@jjmaestro jjmaestro commented Sep 13, 2024

Fixes issue #56

Follow-up and credit to @alexconrey (PR #55), @ericlchen1 (PR #64) and @benmccown (PR #67) for their work on similar PRs that I've reviewed and drawn some inspiration to create "one 💍 PR to merge them all" 😅

Problem

Debian has two types of repos: "canonical" and "flat". Each has a different sources.list syntax:

"canonical": (see https://wiki.debian.org/DebianRepository/Format#Overview)

deb uri distribution [component1] [component2] [...]

flat: (see https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format)

deb uri directory/

Per the spec,

A flat repository does not use the dists hierarchy of directories, and instead places meta index and indices directly into the archive root (or some part below it)

Thus, the URL logic in _fetch_package_index() is incorrect for these repos and it always fails to fetch the Package index.

Solution

Just use the Debian sources.list convention in the sources section of the manifest to add canonical and flat repos. Depending on whether the channel has one directory that ends in '/' or a (dist, component, ...) structure the _fetch_package_index () and other internal logic will know whether the source is a canonical or a flat repo.

For example:

version: 1

sources:
  # canonical repo
  - channel: bullseye main contrib
    url: https://snapshot-cloudflare.debian.org/archive/debian/20240210T223313Z
  # flat repos, note the trailing '/' and the lack of distribution or components
  - channel: bullseye-cran40/
    url: https://cloud.r-project.org/bin/linux/debian
  - channel: ubuntu2404/x86_64/
    url: https://developer.download.nvidia.com/compute/cuda/repos

archs:
  - amd64

packages:
  - bash
  - r-mathlib
  - nvidia-container-toolkit-base

Disregarding the "mixing" of Ubuntu and Debian repos for the purpose of the example, this manifest shows that you can mix canonical and flat repos and you can mix multiarch and single-arch flat repos and canonical repos.

You will still have the same problems as before with packages that only exist for one architecture and/or repos that only support one architecture. In those cases, simply separate the repos and packages into their own manifests.


Note

This PR also fixes an issue with NVIDIA CUDA flat repos that don't follow the Debian repo spec and have invalid 'Filename' paths.

The Debian repo spec for 'Filename' says:

The mandatory Filename field shall list the path of the package archive relative to the base directory of the repository. The path should be in canonical form, that is, without any components denoting the current or parent directory ("." or ".."). It also should not make use of any protocol-specific components, such as URL-encoded parameters.

However, there are cases where this is not honored. In those cases we try to work around this by assuming 'Filename' is relative to the sources.list directory/ so we combine them and normalize the new 'Filename' path.

Note that, so far, only the NVIDIA CUDA repos needed this workaround so maybe this heuristic will break for other repos that don't conform to the Debian repo spec.

@jjmaestro
Copy link
Contributor Author

jjmaestro commented Sep 13, 2024

@thesayyn I think the PR is ready but IMHO it's a bit "dirty" due to how the current code is structured. I have a bunch of refactors and code cleanups in my local repo that IMHO are quite nice and help separate concerns between manifest, lockfile, etc, but it's a much larger PR.

Not sure if I should try and get that PR reviewed so that I can then rebase this PR on top. My guess is that once the refactors are done, the if flat_repo else logic that's split across two places (resolve.bzl, package_index.bzl) could probably be cleaner.

Also, I had to bump bazel_skylib version in MODULE.bazel because it seems that 1.5.0 didn't have is_normalized and was added in bazelbuild/bazel-skylib@0e485c8. Since this method was only available in 1.7.0 and onwards and the latest release is 1.7.1, I've just tried to use the latest. Not sure if I should try to use a lower one, I've seen some comments in distroless/dependencies.bzl but I'm not too sure what's that for (I'd love to know to learn and understand more about Bazel!).

Finally, regarding paths, I'm getting a weird (for me, a Bazel n00b) error that I haven't found a way to fix:

bazel test //...
DEBUG: /src/workspace/apt/index.bzl:88:14: 
No lockfile was given, please run `bazel run @apt_security//:lock` to create the lockfile.
ERROR: /src/workspace/apt/private/BUILD.bazel:41:12: in bzl_library rule //apt/private:package_index: target '@@bazel_skylib~//lib:paths.bzl' is not visible from target '//apt/private:package_index'. Check the visibility declaration of the former target if you think the dependency is legitimate. To set the visibility of that source file target, use the exports_files() function
ERROR: /src/workspace/apt/private/BUILD.bazel:41:12: Analysis of target '//apt/private:package_index' failed
Use --verbose_failures to see the command lines of failed build steps.
ERROR: Analysis of target '//apt/private:package_index' failed; build aborted
INFO: Elapsed time: 5.765s, Critical Path: 0.74s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
ERROR: No test targets were found, yet testing was requested

I added load("@bazel_skylib//lib:paths.bzl", "paths") to apt/private/package_index.bzl and I added the dependency in apt/private/BUILD.bazel, and I've checked and bazel-skylib/lib/BUILD has default_visibility to public so I'm a bit lost as to what's going on. Maybe @thesayyn / @alexeagle can help! 🙏

EDIT: @aaomidi solved it! THANKS!! 🎉

Thanks!!

apt/private/BUILD.bazel Outdated Show resolved Hide resolved
@jjmaestro jjmaestro force-pushed the feat-support-flat-repos branch from b4414ac to 35ed287 Compare September 13, 2024 17:48
Fixes issue GoogleContainerTools#56

Follow-up and credit to @alexconrey (PR GoogleContainerTools#55), @ericlchen1 (PR GoogleContainerTools#64) and
@benmccown (PR GoogleContainerTools#67) for their work on similar PRs that I've reviewed and
drawn some inspiration to create "one 💍 PR to merge them all" 😅

Problem:

Debian has two types of repos: "canonical" and "flat". Each has a
different sources.list syntax:

"canonical":
```
deb uri distribution [component1] [component2] [...]
```
(see https://wiki.debian.org/DebianRepository/Format#Overview)

flat:
```
deb uri directory/
```
(see https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format)

A flat repository does not use the dists hierarchy of directories, and
instead places meta index and indices directly into the archive root (or
some part below it)

Thus, the URL logic in _fetch_package_index() is incorrect for these
repos and it always fails to fetch the Package index.

Solution:

Just use the Debian sources.list convention in the 'sources' section of
the manifest to add canonical and flat repos. Depending on whether the
channel has one directory that ends in '/' or a (dist, component, ...)
structure the _fetch_package_index and other internal logic will
know whether the source is a canonical or a flat repo.

For example:
```
version: 1

sources:
  # canonical repo
  - channel: bullseye main contrib
    url: https://snapshot-cloudflare.debian.org/archive/debian/20240210T223313Z
  # flat repos, note the trailing '/' and the lack of distribution or components
  - channel: bullseye-cran40/
    url: https://cloud.r-project.org/bin/linux/debian
  - channel: ubuntu2404/x86_64/
    url: https://developer.download.nvidia.com/compute/cuda/repos

archs:
  - amd64

packages:
  - bash
  - r-mathlib
  - nvidia-container-toolkit-base
```

Disregarding the "mixing" of Ubuntu and Debian repos for the purpose of
the example, this manifest shows that you can mix canonical and flat
repos and you can mix multiarch and single-arch flat repos and canonical
repos.

You will still have the same problems as before with packages that only
exist for one architecture and/or repos that only support one
architecture. In those cases, simply separate the repos and packages
into their own manifests.

NOTE:
The NVIDIA CUDA repos don't follow Debian specs and have issues with the
package filenames. This is addressed in a separate commit.
Although the Debian repo spec for 'Filename' (see
https://wiki.debian.org/DebianRepository/Format#Filename) clearly says
that 'Filename' should be relative to the base directory of the repo and
should be in canonical form (i.e. without '.' or '..') there are cases
where this is not honored.

In those cases we try to work around this by assuming 'Filename' is
relative to the sources.list directory/ so we combine them and normalize
the new 'Filename' path.

Note that, so far, only the NVIDIA CUDA repos needed this workaround so
maybe this heuristic will break for other repos that don't conform to
the Debian repo spec.
@jjmaestro jjmaestro force-pushed the feat-support-flat-repos branch from 35ed287 to 8441ec3 Compare September 13, 2024 20:26
@benmccown
Copy link

@jjmaestro I think this addresses the issues I reported in #66 and tried to address in #67.

The one issue I ran into with Nvidia's CUDA repo and packages, that I'm not sure I see addressed, is duplicate transitive dependencies not being handled (which I handled with a depset). I responded to you on this topic here.

@jjmaestro jjmaestro changed the title feat: support flat repos feat: support flat repos [OLD, see #90] Sep 19, 2024
@jjmaestro
Copy link
Contributor Author

@benmccown cool, thanks! will definitely have a look! I've also drafted #90 which is the exact same as this PR but on top of other PRs with fixes and a MASSIVE refactoring PR that fixes a lot of the architecture issues that I found. TL;DR there's no more "if-else" for flat repos in multiple places, everything is now in one manifest.bzl and package_index.bzl file :)

@thesayyn / @alexeagle please have a look at the new PRs, I hope you like them better and thus, this one should be abandoned.

@jjmaestro jjmaestro closed this Sep 19, 2024
@jjmaestro jjmaestro deleted the feat-support-flat-repos branch September 19, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants