Skip to content

Commit

Permalink
Draft: Proposal to enhance the generic artifacts fetcher purls
Browse files Browse the repository at this point in the history
Signed-off-by: Bruno Pimentel <[email protected]>
  • Loading branch information
brunoapimentel committed Oct 30, 2024
1 parent 16c3568 commit 3a1c1c3
Showing 1 changed file with 178 additions and 0 deletions.
178 changes: 178 additions & 0 deletions docs/design/generic-enhanced-purls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Support for different purl types in the generic artifact fetcher

The generic artifact package manager is being added to Cachi2 as a means for users to introduce files that do not belong to traditional package manager ecosystems (e.g. pip, npm, golang) to their hermetic container builds. Since Cachi2 does not have any extra information about the file that's being fetched, the purls are always reported as [pkg:generic](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic).

There are use cases that would benefit from more accurate purls, though, such as the recent Maven artifacts [proposal](https://github.com/containerbuildsystem/cachi2/pull/663). Considering that the purl specification already identifies several types of packages that don't fit into traditional package manager (e.g. github, docker, huggingface; see the [purl types spec](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst) for more info), this proposal builds on top of the fundamentals of the generic fetcher to provide an extensible mechanism that would allow Cachi2 to fetch files from specific sources and report them with matching purl types.

## Consuming input purls

A purl is simply a way to represent a unique package and it's location on the internet. By allowing users to specify a purl as part of a package definition in the lockfile, Cachi2 can verify that it actually resolves to a valid download location, and that it adheres to the fundamentals of the purl spec. Note that there's a great variety of purl types, and resolving them will involve type-specific logic, which means we'll only be able to support a small subset of purl types.

This is a summary of how the resolution of a single package will look like:

- Instead of consuming a `download_url`, the package will specify a `purl` attribute in the lockfile
- Parse the purl by using the [packageurl-python](https://github.com/package-url/packageurl-python) library
- Validate that the purl is within the supported types
- Resolver the purl to a download URL
- Download the file
- Validate the file's checksum
- Generate a SBOM component containing the input purl


## Type studies

### Maven

#### generic_artifacts.yaml

```yaml
metadata:
version: 1.0.0
artifacts:
- purl: pkg:maven/ga.io.quarkus/[email protected]?type=jar&repository_url=https://maven.repository.redhat.com&checksums=sha1:e4ca5fadf89e62fb29d0d008046489b2305295bf
target: quarkus.jar
```
#### resolving the download url
```python
# parse the input purl
pkg:maven/{group_id}/{artifact_id}@{version}[-{classifier}]?type={extension}&repository_url={repository_url}&checksums={checksum}

# generate the download URL
https://{repository_url}/{as_dir(group_id)}/{artifact_id}/{version}/{artifact_id}-{version}[{-classifier}].{extension}
```

#### example
```python
# purl
pkg:maven/ga.io.quarkus/quarkus-core@3.8.5.redhat-00004?type=jar&repository_url=https://maven.repository.redhat.com&checksums=sha1:e4ca5fadf89e62fb29d0d008046489b2305295bf

# download URL
https://maven.repository.redhat.com/ga/io/quarkus/quarkus-core/3.8.5.redhat-00004/quarkus-core-3.8.5.redhat-00004.jar
```

#### SBOM component

```json
{
"name": "quarkus-core",
"version": "3.8.5.redhat-00004",
"purl": "pkg:maven/ga.io.quarkus/[email protected]?type=jar&repository_url=https://maven.repository.redhat.com&checksums=sha1:e4ca5fadf89e62fb29d0d008046489b2305295bf",
"properties": [
{
"name": "cachi2:found_by",
"value": "cachi2:generic"
}
],
"externalReferences": [
{
"type": "distribution",
"url": "https://huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF"
}
],
"type": "file"
}
```

### OCI

#### generic_artifacts.yaml
```yaml
metadata:
version: '1.0'
artifacts:
- purl: pkg:oci/buildah-task@sha256%3Ab2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c?repository_url=quay.io/konflux-ci&arch=amd64&tag=latest
target: quakus.jar
```
#### resolving the download url
```python
# parse the input purl
pkg:oci/{name}@{algorithm}%3A{digest}?repository_url={repository_url_and_namespace}[&arch={arch}][&tag={tag}]

# generate the download command
podman pull {repository_url_and_namespace}/{name}@{algorithm}:{digest}
```

#### example
```python
# input purl
pkg:oci/buildah-task@sha256%3Ab2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c?repository_url=quay.io/konflux-ci&arch=amd64&tag=latest

# download command
podman pull quay.io/konflux-ci/buildah-task@sha256:b2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c
```

#### SBOM component
```json
{
"name": "buildah-task",
"purl": "pkg:oci/buildah-task@sha256%3Ab2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c?repository_url=quay.io/konflux-ci&arch=amd64&tag=latest",
"properties": [
{
"name": "cachi2:found_by",
"value": "cachi2:generic"
}
],
"externalReferences": [
{
"type": "distribution",
"url": "quay.io/konflux-ci/buildah-task@sha256:b2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c"
}
],
"type": "file"
}
```

### Hugging Face

#### generic_artifacts.yaml
```yaml
metadata:
version: '1.0'
artifacts:
- purl: pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027
```
#### resolving the download url
```python
# parse the purl
pkg:huggingface/{namespace}/{name}@{commit_hash}[?repository_url={repo_url}]

# download command
git clone https://{repository_url}/{namespace}/{name}
git checkout {commit_hash}
```

#### example

```python
# input purl
pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027

# download command
git clone https://huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
git checkout 043235d6088ecd3dd5fb5ca3592b6913fd516027
```

#### SBOM component

```json
{
"name": "Llama-3.1-Nemotron-70B-Instruct-HF",
"purl": "pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027",
"properties": [
{
"name": "cachi2:found_by",
"value": "cachi2:generic"
}
],
"externalReferences": [
{
"type": "distribution",
"url": "https://huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF"
}
],
"type": "file"
}
```

0 comments on commit 3a1c1c3

Please sign in to comment.