Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debian 11 repo is broken #218

Closed
MKrupauskas opened this issue Aug 14, 2023 · 18 comments
Closed

Debian 11 repo is broken #218

MKrupauskas opened this issue Aug 14, 2023 · 18 comments

Comments

@MKrupauskas
Copy link

The commit 2dff280 restructured the Debian 11 repo in a breaking way.

Previously the below setup used to work, now it's failing:

root@host:/# cat /etc/apt/sources.list | grep nvidia
deb [arch=amd64] http://aptarchive.uber.internal/libnvidia-container/debian11/amd64 /

root@host:/# apt update
...
E: The repository 'http://repo.internal/libnvidia-container/debian11/amd64  Release' does not have a Release file.

this is because /debian11 used to symlink to /stable/debian11 which used to symlink /stable/debian10 which contained the amd64 directory with the .deb builds. https://github.com/NVIDIA/libnvidia-container/tree/9ce31ae4f042508cd8aabfad6168114c1cde30f0

/debian10, /debian11, /stable/debian11 should all have amd64 symlinks ultimately pointing to /stable/debian10/amd64

@elezar
Copy link
Member

elezar commented Aug 15, 2023

@MKrupauskas would switching to /libnvidia-container/debian10/amd64 as the source of truth for the package be a solution on your end?

Our intent with the official documentation was to make the downloading the repository list file work across different distributions, but the .list files would locally refer to the lowest compatible distribution for a given package flavor.

In the Debian case, this is debian10. The motivation for the changes that are causing the breakages are called out in NVIDIA/nvidia-container-toolkit#89 (comment)

@jonathanjsimon
Copy link

All these user complaints would be solved with a symlink 😉

@elezar
Copy link
Member

elezar commented Sep 6, 2023

All these user complaints would be solved with a symlink 😉

@jonathanjsimon it's not quite a simple as that. A symlink duplicates the contents of the target folder at the link location when publishing these repos through GitHub pages. The reason this optimisation was performed was that the resultant artifact is already too large, causing the pages deployment to fail meaning that new packages are not available.

We are aware that there may be ways to increase the timeout using custom pages deployments. If you have experience in how to do this, suggestions are welcome.

@MKrupauskas
Copy link
Author

While we did work around the issue by pointing our source list to Debian 10 the solution isn't ideal. If the only issue is the artifact size and build timeouts I think we should address that for the sake of having a Debian repo that matches the repo standard and user expectations.

Could you share some logs on what exactly times out if we correctly symlink the distribution directories? Looking at github action docs the steps themselves shouldn't time out for 360m if the default isn't overridden https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes

@elezar
Copy link
Member

elezar commented Sep 8, 2023

@MKrupauskas I have made the symlink changes to my personal mirror elezar@98ee43d.

The GitHub actions deploying this is here:

A previous action shows the archive size warning:

The following is an example of a deployment that failed due to a timeout, although this was using the "Deploy from branch" pages deployment and not an explicit workflow as we are using now.

@elezar
Copy link
Member

elezar commented Feb 26, 2024

We have updated our repository structure and installation instructions to make use of generic debian packages. The distribution name no longer affects the instructions.

Please see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html and reopen this issue if there are still problems.

@elezar elezar closed this as completed Feb 26, 2024
@HenriWahl
Copy link

Hi there,
when using tools like apt-mirror or apt-mirror2 the file Packages always is empty after being downloaded from https://nvidia.github.io/libnvidia-container/stable/deb/amd64/Packages, but works in a browser. Do you have any idea where to search for a solution?

@elezar
Copy link
Member

elezar commented Feb 29, 2024

@HenriWahl I don't know what apt-mirror expects. This is the file tree as deployed to GitHub pages: https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/deb/amd64

If there is additional metadata required by the toolking we could consider adding it.

@HenriWahl
Copy link

HenriWahl commented Mar 1, 2024

@elezar I am not sure what is missing, looks good to me.
The only hint I have that it works with https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64, maybe there is some difference.

Edit: yes, there are some differences:

Edit 2: I found this being an older problem: NVIDIA/nvidia-docker#730

@elezar
Copy link
Member

elezar commented Mar 1, 2024

Those are useful pointers. I will spend some time investigating this.

@elezar
Copy link
Member

elezar commented Mar 1, 2024

I have just tried the following in a clean ubuntu container:

  1. Installed apt-mirror
  2. Edit /etc/apt/mirror.list to only reference:
deb https://nvidia.github.io/libnvidia-container/experimental/deb/amd64 /
  1. When running apt-mirror I then see:
Processing indexes: [Psh: 1: xz: not found
]
  1. I then installed xz-utils:
apt-get install -y xz-utils
  1. When I now ran apt-mirror the repo is mirrored:
$ ls /var/spool/apt-mirror/mirror/
nvidia.github.io
  1. And in the folders themselves:
ls /var/spool/apt-mirror/mirror/nvidia.github.io/libnvidia-container/experimental/deb/amd64/
Packages                                           libnvidia-container-tools_1.15.0~rc.3-1_amd64.deb  nvidia-container-toolkit-base_1.14.0~rc.2-1_amd64.deb
Packages.xz                                        libnvidia-container1-dbg_1.14.0~rc.2-1_amd64.deb   nvidia-container-toolkit-base_1.15.0~rc.1-1_amd64.deb
libnvidia-container-dev_1.14.0~rc.2-1_amd64.deb    libnvidia-container1-dbg_1.15.0~rc.1-1_amd64.deb   nvidia-container-toolkit-base_1.15.0~rc.2-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.1-1_amd64.deb    libnvidia-container1-dbg_1.15.0~rc.2-1_amd64.deb   nvidia-container-toolkit-base_1.15.0~rc.3-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.2-1_amd64.deb    libnvidia-container1-dbg_1.15.0~rc.3-1_amd64.deb   nvidia-container-toolkit_1.14.0~rc.2-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.3-1_amd64.deb    libnvidia-container1_1.14.0~rc.2-1_amd64.deb       nvidia-container-toolkit_1.15.0~rc.1-1_amd64.deb
libnvidia-container-tools_1.14.0~rc.2-1_amd64.deb  libnvidia-container1_1.15.0~rc.1-1_amd64.deb       nvidia-container-toolkit_1.15.0~rc.2-1_amd64.deb
libnvidia-container-tools_1.15.0~rc.1-1_amd64.deb  libnvidia-container1_1.15.0~rc.2-1_amd64.deb       nvidia-container-toolkit_1.15.0~rc.3-1_amd64.deb
libnvidia-container-tools_1.15.0~rc.2-1_amd64.deb  libnvidia-container1_1.15.0~rc.3-1_amd64.deb

Could you confirm that xz-utils is installed on your system?

@HenriWahl
Copy link

Hi @elezar - thanks for your investigations!

I can confirm that my apt-mirror image did NOT have the package xz-utils installed but now it works WITH it!

Great job! 👍

@HenriWahl
Copy link

@elezar one thing is left: now the apt command on a client cries that there is no Release file.

I see it is even missing at https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/deb/amd64.

@elezar
Copy link
Member

elezar commented Mar 1, 2024

From the following documentation: https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format it is unclear whether a Release file is actually required. It seems that either InRelease or Release must be specified.

Can you give more information on what apt commands you're using and what the errors are?

@HenriWahl
Copy link

After an apt update i get this:

Ign:5 https://mirror-apt.local/nvidia-container-toolkit-jammy  InRelease
Ign:6 https://mirror-apt.local/nvidia-cuda-jammy  InRelease
Err:7 https://mirror-apt.local/nvidia-container-toolkit-jammy  Release
  404  Not Found [IP: 10.10.10.10 443]
Hit:8 https://mirror-apt.local/nvidia-cuda-jammy  Release
Reading package lists... Done
E: The repository 'https://mirror-apt.local/nvidia-container-toolkit-jammy  Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.

InRealease and Release are both getting tried. Meanwhile I found that none of them does exist in my local mirror, as in your listing above.

@elezar
Copy link
Member

elezar commented Mar 1, 2024

Does:

sudo apt-get update --allow-insecure-repositories

work as expected?

@HenriWahl
Copy link

Yes it does.

The problem seems to be caused by apt-mirror, according to apt-mirror/apt-mirror#156. It seems to miss this file on flat repositories. I will look for it or an alternative next week. Thanks for your commitment!

@elezar
Copy link
Member

elezar commented Mar 1, 2024

Yes it does.

The problem seems to be caused by apt-mirror, according to apt-mirror/apt-mirror#156. It seems to miss this file on flat repositories. I will look for it or an alternative next week. Thanks for your commitment!

I think you can get by this by marking the local mirror as trusted or ensuring that the public key for our repos is also downloaded. For example, as per our documentation https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Note that the lines effectively look like:

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /

in this case and setting up something similar for your mirrors would be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants