-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Draft: Proposal to enhance the generic artifacts fetcher purls
Signed-off-by: Bruno Pimentel <[email protected]>
- Loading branch information
1 parent
16c3568
commit 1d0c50c
Showing
1 changed file
with
233 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,233 @@ | ||
# Support for different purl types in the generic artifact fetcher | ||
|
||
The generic artifact package manager is being added to Cachi2 as a means for users to introduce files that do not belong to traditional package manager ecosystems (e.g. pip, npm, golang) to their hermetic container builds. Since Cachi2 does not have any extra information about the file that's being fetched, the purls are always reported as [pkg:generic](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#generic). | ||
|
||
There are use cases that would benefit from more accurate purls, though, such as the recent Maven artifacts [proposal](https://github.com/containerbuildsystem/cachi2/pull/663). Considering that the purl specification already identifies several types of packages that don't fit into traditional package manager (e.g. github, docker, huggingface; see the [purl types spec](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst) for more info), this proposal builds on top of the fundamentals of the generic fetcher to provide an extensible mechanism that would allow Cachi2 to fetch files from specific sources and report them with matching purl types. | ||
|
||
## Consuming input purls | ||
|
||
A purl is simply a way to represent a unique package and it's location on the internet. By allowing users to specify a purl as part of a package definition in the lockfile, Cachi2 can verify that it actually resolves to a valid download location, and that it adheres to the fundamentals of the purl spec. Note that there's a great variety of purl types, and resolving them will involve type-specific logic, which means we'll only be able to support a small subset of purl types. | ||
|
||
This is a summary of how the resolution of a single package will look like: | ||
|
||
- Instead of consuming a `download_url`, the package will specify a `purl` attribute in the lockfile | ||
- Parse the purl by using the [packageurl-python](https://github.com/package-url/packageurl-python) library | ||
- Validate that the purl is within the supported types | ||
- Resolve the purl to a download URL by using a type-specific algorithm | ||
- Download the file | ||
- Validate the file's checksum | ||
- Generate a SBOM component containing the input purl | ||
|
||
### What confidence Cachi2 has in the reported components | ||
|
||
- The purl is valid | ||
- The purl resolves to an actual URL that contains an artifact | ||
- The checksum was validated | ||
|
||
|
||
## Type studies | ||
|
||
### Maven | ||
|
||
#### generic_artifacts.yaml | ||
|
||
```yaml | ||
metadata: | ||
version: 1.0.0 | ||
artifacts: | ||
- purl: pkg:maven/ga.io.quarkus/[email protected]?type=jar&repository_url=https://maven.repository.redhat.com&checksum=sha256:e4ca5fad | ||
target: quarkus.jar | ||
``` | ||
#### resolving the download url | ||
```python | ||
# parse the input purl | ||
pkg:maven/{group_id}/{artifact_id}@{version}[-{classifier}]?type={extension}&repository_url={repository_url}&checksum={checksum} | ||
|
||
# generate the download URL | ||
https://{repository_url}/{as_dir(group_id)}/{artifact_id}/{version}/{artifact_id}-{version}[{-classifier}].{extension} | ||
``` | ||
|
||
#### SBOM component | ||
|
||
```json | ||
{ | ||
"name": "quarkus-core", | ||
"version": "3.8.5.redhat-00004", | ||
"purl": "pkg:maven/ga.io.quarkus/[email protected]?type=jar&repository_url=https://maven.repository.redhat.com&checksum=sha256:e4ca5fad", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "https://maven.repository.redhat.com/ga/io/quarkus/quarkus-core/3.8.5.redhat-00004/quarkus-core-3.8.5.redhat-00004.jar" | ||
} | ||
], | ||
"type": "file" | ||
} | ||
``` | ||
|
||
### Nuget | ||
|
||
#### generic_artifacts.yaml | ||
```yaml | ||
metadata: | ||
version: 1.0.0 | ||
artifacts: | ||
- purl: pkg:nuget/[email protected]&checksum=sha256:e4ca5fad | ||
``` | ||
#### resolving the download URL | ||
```python | ||
# parse the input purl | ||
pkg:nuget/[{namespace}.]{name}@{version}?checksum={checksum}[&repository_url={repository_url}] | ||
|
||
# generate the download URL | ||
https://globalcdn.nuget.org/packages/{lowercase(namespace + '.' + name)}.{version}.nupkg?packageVersion={version} | ||
``` | ||
|
||
#### SBOM component | ||
```json | ||
{ | ||
"name": "Google.Protobuf", | ||
"version": "3.28.3", | ||
"purl": "pkg:nuget/[email protected]?checksum=sha256:e4ca5fad", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "https://globalcdn.nuget.org/packages/google.protobuf.3.28.3.nupkg?packageVersion=3.28.3" | ||
} | ||
], | ||
"type": "file" | ||
} | ||
``` | ||
|
||
### Hugging Face | ||
|
||
#### generic_artifacts.yaml | ||
```yaml | ||
metadata: | ||
version: '1.0' | ||
artifacts: | ||
- purl: pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027?checksum=sha256:e4ca5fad | ||
- purl: pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027?checksum=sha256:e4ca5fad&file_name=model-00001-of-00030.safetensors | ||
``` | ||
#### resolving the download url | ||
```python | ||
# parse the purl | ||
pkg:huggingface/{namespace}/{name}@{commit_hash}?checksum={checksum}[&repository_url={repo_url}][&file_name={file_name}] | ||
|
||
# download URL (in case a file_name is present) | ||
https://{repository_url}/{namespace}/{name}/blob/{commit_hash}/{file_name} | ||
|
||
# git clone command (in case file_name is absent) | ||
git clone https://{repository_url}/{namespace}/{name} | ||
git checkout {commit_hash} | ||
``` | ||
|
||
#### SBOM component | ||
|
||
```json | ||
[ | ||
{ | ||
"name": "Llama-3.1-Nemotron-70B-Instruct-HF", | ||
"purl": "pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027?checksum=sha256:e4ca5fad", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "https://huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF" | ||
} | ||
], | ||
"type": "library" | ||
}, | ||
{ | ||
"name": "Llama-3.1-Nemotron-70B-Instruct-HF", | ||
"purl": "pkg:huggingface/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF@043235d6088ecd3dd5fb5ca3592b6913fd516027?checksum=sha256:e4ca5fad&file_name=model-00001-of-00030.safetensors", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF/blob/fac73d3507320ec1258620423469b4b38f88df6e/model-00001-of-00030.safetensors" | ||
} | ||
], | ||
"type": "file" | ||
} | ||
] | ||
``` | ||
|
||
### OCI | ||
|
||
#### generic_artifacts.yaml | ||
```yaml | ||
metadata: | ||
version: '1.0' | ||
artifacts: | ||
- purl: pkg:oci/buildah-task@sha256%3Ab2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c?repository_url=quay.io/konflux-ci&arch=amd64&tag=latest | ||
target: quakus.jar | ||
``` | ||
#### resolving the download url | ||
```python | ||
# parse the input purl | ||
pkg:oci/{name}@{algorithm}%3A{digest}?repository_url={repository_url_and_namespace}[&arch={arch}][&tag={tag}] | ||
|
||
# generate the download command | ||
podman pull {repository_url_and_namespace}/{name}@{algorithm}:{digest} | ||
``` | ||
|
||
#### SBOM component | ||
```json | ||
{ | ||
"name": "buildah-task", | ||
"purl": "pkg:oci/buildah-task@sha256%3Ab2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c?repository_url=quay.io/konflux-ci&arch=amd64&tag=latest", | ||
"properties": [ | ||
{ | ||
"name": "cachi2:found_by", | ||
"value": "cachi2:generic" | ||
} | ||
], | ||
"externalReferences": [ | ||
{ | ||
"type": "distribution", | ||
"url": "quay.io/konflux-ci/buildah-task@sha256:b2d6c32d1e05e91920cd4475b2761d58bb7ee11ad5dff3ecb59831c7572b4d0c" | ||
} | ||
], | ||
"type": "container" | ||
} | ||
``` | ||
|
||
## Initial thoughts | ||
|
||
From a Cachi2 perspective, we can separate purls types into ones that are part of existing package manager ecosystems (such as nuget, composer, maven) and ones that are not (github, huggingface, oci): | ||
- Types that refer to existing package managers seem to have a straightforward resolution. The package is likely a file in a predictable URL within a registry. | ||
- Types that don't refer to existing package managers are trickier to resolve, since they might be poiting to a container image or a full git repository. They might also need extra tools to provide a smooth resolution (podman, git, etc) | ||
|
||
|
||
### Decision points | ||
|
||
- Should the checksums be specified as part of the input purl? | ||
- Should we limit the qualifiers to the types that are strictly available to that type? | ||
- Should we allow types that are not files (git repos, OCI artifacts)? Should they be reported as different component types? | ||
- How should our policy be regarding extending the generic fetcher for other package managers we don't fully support? Would this impact the will of contributors to provide full support for a package manager? |