Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: store SBOM data in rpm headers? #2389

Closed
mlschroe opened this issue Feb 8, 2023 · 7 comments
Closed

RFE: store SBOM data in rpm headers? #2389

mlschroe opened this issue Feb 8, 2023 · 7 comments
Assignees
Labels

Comments

@mlschroe
Copy link
Contributor

mlschroe commented Feb 8, 2023

I'm currently looking into generating SBOMs for container, and I wonder if someone has already pondered if we want to store SBOM data in an rpm header.

Here's where I come from: SBOM generator tools like "syft" support both querying the systems package database to know what packages are installed and getting data from files present in the system. The later is needed because (at least in the container world) many files are generated by the build process.

So for example, if syft sees a go binary it will extract the buildinfo from it and generate an entry for each module dependency. Those are basically cpe and purl urls. SPDX will store them as "externalRef", CycloneDX has them directly in the component data.

Do we want to make it possible to have this for rpm packages as well? I.e. add one ore more tags to store component identifiers? We would need to store an array of "(type,locator)" tuples.

@pmatilai
Copy link
Member

pmatilai commented Feb 9, 2023

Hard for me to comment when I don't know a single term/name mentioned here, starting with SBOM which I looked up from wikipedia 😆

A smallish practical example of what that data may look like would help.

@mlschroe
Copy link
Contributor Author

mlschroe commented Feb 9, 2023

But but but... where have you been? Software supply chain security is the thing nowadays ;-)

@pmatilai
Copy link
Member

pmatilai commented Feb 9, 2023

Deep in the Finnish countryside? 😅

@mlschroe
Copy link
Contributor Author

mlschroe commented Feb 9, 2023

I hope I get this right, because I'm no expert for that topic either.

SBOM is "Software bill of materials". Basically it is a document that describes what exactly is on a product/appliance/container/... There are two standard formats, SPDX and CycloneDX, coming from different directions.

SPDX comes from the license side. It is used so that customers can check the licenses of all the software used in some product. E.g. the automotive folks want to make sure that there is no GPLv3 license included.

CycloneDX comes from the vulnerability side. It is used for checking if a product contains software that has a known vulnerability.

Nowadays, you can use both formats for both purposes (and also convert between the two).

Coming back to rpm: The "License" tag is good enough for the license use case. What's missing is information about the included software. This is important because of the modern trend to do static linking (i.e. golang) or bundling modules.

As an example, the "cosign" tool is written in golang. go version -m /usr/bin/cosign contains a list of all "bundled" go modules:

...
        dep     github.com/tjfoc/gmsm   v1.3.2
        dep     github.com/transparency-dev/merkle      v0.0.1
        dep     github.com/vbatts/tar-split     v0.11.2
...

A SBOM generator would then convert this information into identifiers for those modules. Usually purl urls are used: https://github.com/package-url/purl-spec. The purl urls look like this: pkg:golang/github.com/Azure/go-autorest/autorest/[email protected]

Another example is identifiers for language modules, e.g. python modules. Those can be calculated from the python egg data.

The question was if it would make sense to have a place in the rpm header for those identifiers.

@pmatilai
Copy link
Member

Both #1532 and #607 seem to touch on the same subject.

I'm not opposed at all in principle, the question is more in the details: should the info be in the header of each binary package, or would a buildinfo-style file/subpackage (with a strong identifier tying it to the same build) be enough? The latter allows more flexibility for those who don't need the stuff.

@pmatilai pmatilai added the RFE label Feb 10, 2023
@xsuchy
Copy link
Member

xsuchy commented Apr 17, 2023

I am interested in this as well.

This does not need to be fully implemented by rpmbuild itself. The list of "components" used for the build can be gathered by the build system. E.g., Mock can already do that https://rpm-software-management.github.io/mock/Plugin-PackageState In this case, the file installed_pkgs.log can have more than 200 lines.

As I see the option:

  • in the header:
    • pros: easy to parse by tools like Pulp that store header in DB and the payload in S3 buckets.
    • cons: the header can grow significantly
  • in the file:
    • pros: very flexible
    • cons: you need to read the payload to extract the information.

I personally like the option to have it in the file. The SBOM audit is not an everyday operation and reading the payload is not big overhead in such situation.

In such case the rpmbuild implementation can be just add another option --include-sbom-file=FILE and include the FILE in the payload. Maybe in some location defined by macro provided by the platform (Fedora, SUSE).

@Conan-Kudo
Copy link
Member

@mlschroe A lot of the stuff around bundled dependencies are often expressed as bundled() Provides either manually or via dependency generators in Fedora. I'm not sure we want to do something different when that works fairly well...

@pmatilai pmatilai self-assigned this Nov 29, 2023
@pmatilai pmatilai assigned pmatilai and unassigned pmatilai Jan 17, 2024
@rpm-software-management rpm-software-management locked and limited conversation to collaborators Jan 17, 2024
@pmatilai pmatilai converted this issue into discussion #2851 Jan 17, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

4 participants