Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: register Maven transitive extractor #441

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

cuixq
Copy link
Collaborator

@cuixq cuixq commented Feb 5, 2025

This PR registers Maven pom.xml transitive extractor to the extractor list.

To distinguish from the default Java extractor that does not require network access, a new defaultnet list of extractors are added.

Both clients required by the extractor are moved to configurations and DefaultConfig initializes the clients assuming Maven Central as the default registry.

We very likely need to refactor how the clients are initialized in the future considering that we would like to pass the default registry as well as the option whether to use deps.dev as the data source.

@cuixq cuixq marked this pull request as ready for review February 5, 2025 23:53
@cuixq cuixq requested review from erikvarga and another-rex and removed request for erikvarga February 5, 2025 23:53
@@ -51,12 +66,11 @@ func (e Extractor) Version() int { return 0 }
// Requirements of the extractor.
func (e Extractor) Requirements() *plugin.Capabilities {
return &plugin.Capabilities{
Network: true,
DirectFS: true,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikvarga it seems DirectFS is not needed for this extractor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably still need DirectFS to read username and password from settings.xml. However currently this is done when we make the extractor instead of during the extraction..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is settings.xml expected to be read from the scanner's filesystem or the artifact host's filesystem? e.g. if a SCALIBR integration scans a remote container through a virtual filesystem does it expect to find the settings.xml with the creds in the container or on the machine the scan is being run from?

If it's on the container then we wouldn't be able to read it through direct filesystem access so we can't use virtual filesystems with this extractor (and thus DirectFS should be true).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the settings.xml should be read from the container itself - it usually sits in where Maven is installed, so virtual filesystems should work here.

For the second paragrah, it sounds conflicting to me:

If it's on the container then we wouldn't be able to read it through direct filesystem access so we can't use virtual filesystems with this extractor (and thus DirectFS should be true).

should the second half be:

so we should use virtual filesystems and DirectFS should be false

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, so this file should be read from the scanned artifact. Is that what the extractor is doing today? I assume not since it'd need to access the file through the scalibrfs.FS which we only pass to it in Extract, not during construction.

If this is the case then this extractor wouldn't work properly today if e.g. we ran the scalibr binary with the --remote-image flag.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option to fix this is to read the settings.xml during Extract() but only once (e.g. store it in a global var and only read it if it's empty). It's a bit of a hacky solution but extractors get constructed before the SCALIBR scan runs right now, so I can't think of a good way to pass in the virtual FS there.

internal/resolution/client/client.go Outdated Show resolved Hide resolved
extractor/filesystem/list/list.go Show resolved Hide resolved
@@ -51,12 +66,11 @@ func (e Extractor) Version() int { return 0 }
// Requirements of the extractor.
func (e Extractor) Requirements() *plugin.Capabilities {
return &plugin.Capabilities{
Network: true,
DirectFS: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is settings.xml expected to be read from the scanner's filesystem or the artifact host's filesystem? e.g. if a SCALIBR integration scans a remote container through a virtual filesystem does it expect to find the settings.xml with the creds in the container or on the machine the scan is being run from?

If it's on the container then we wouldn't be able to read it through direct filesystem access so we can't use virtual filesystems with this extractor (and thus DirectFS should be true).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants