Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: register Maven transitive extractor #441

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 39 additions & 10 deletions extractor/filesystem/language/java/pomxmlnet/pomxmlnet.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,38 @@ import (

// Extractor extracts Maven packages with transitive dependency resolution.
type Extractor struct {
depClient resolution.DependencyClient
mavenClient *datasource.MavenRegistryAPIClient
}

// Config is the configuration for the pomxmlnet Extractor.
type Config struct {
resolution.DependencyClient
*datasource.MavenRegistryAPIClient
}

// DefaultConfig returns the default configuration for the pomxmlnet extractor.
func DefaultConfig() Config {
// No need to check errors since we are using the default Maven Central URL.
depClient, _ := resolution.NewMavenRegistryClient(datasource.MavenCentral)
mavenClient, _ := datasource.NewMavenRegistryAPIClient(datasource.MavenRegistry{
URL: datasource.MavenCentral,
ReleasesEnabled: true,
})
return Config{
DependencyClient: depClient,
MavenRegistryAPIClient: mavenClient,
}
}

// New makes a new pom.xml transitive extractor with the given config.
func New(c Config) *Extractor {
return &Extractor{
depClient: c.DependencyClient,
mavenClient: c.MavenRegistryAPIClient,
}
}

// Name of the extractor.
func (e Extractor) Name() string { return "java/pomxmlnet" }

Expand All @@ -51,17 +79,16 @@ func (e Extractor) Version() int { return 0 }
// Requirements of the extractor.
func (e Extractor) Requirements() *plugin.Capabilities {
return &plugin.Capabilities{
Network: true,
DirectFS: true,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikvarga it seems DirectFS is not needed for this extractor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably still need DirectFS to read username and password from settings.xml. However currently this is done when we make the extractor instead of during the extraction..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is settings.xml expected to be read from the scanner's filesystem or the artifact host's filesystem? e.g. if a SCALIBR integration scans a remote container through a virtual filesystem does it expect to find the settings.xml with the creds in the container or on the machine the scan is being run from?

If it's on the container then we wouldn't be able to read it through direct filesystem access so we can't use virtual filesystems with this extractor (and thus DirectFS should be true).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the settings.xml should be read from the container itself - it usually sits in where Maven is installed, so virtual filesystems should work here.

For the second paragrah, it sounds conflicting to me:

If it's on the container then we wouldn't be able to read it through direct filesystem access so we can't use virtual filesystems with this extractor (and thus DirectFS should be true).

should the second half be:

so we should use virtual filesystems and DirectFS should be false

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, so this file should be read from the scanned artifact. Is that what the extractor is doing today? I assume not since it'd need to access the file through the scalibrfs.FS which we only pass to it in Extract, not during construction.

If this is the case then this extractor wouldn't work properly today if e.g. we ran the scalibr binary with the --remote-image flag.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option to fix this is to read the settings.xml during Extract() but only once (e.g. store it in a global var and only read it if it's empty). It's a bit of a hacky solution but extractors get constructed before the SCALIBR scan runs right now, so I can't think of a good way to pass in the virtual FS there.

Network: true,
}
}

// FileRequired never returns true, as this is for the osv-scanner json output.
// FileRequired returns true if the specified file matches Maven POM lockfile patterns.
func (e Extractor) FileRequired(fapi filesystem.FileAPI) bool {
return filepath.Base(fapi.Path()) == "pom.xml"
}

// Extract extracts packages from yarn.lock files passed through the scan input.
// Extract extracts packages from pom.xml files passed through the scan input.
func (e Extractor) Extract(ctx context.Context, input *filesystem.ScanInput) ([]*extractor.Inventory, error) {
var project maven.Project
if err := datasource.NewMavenDecoder(input.Reader).Decode(&project); err != nil {
Expand All @@ -71,8 +98,10 @@ func (e Extractor) Extract(ctx context.Context, input *filesystem.ScanInput) ([]
if err := project.MergeProfiles("", maven.ActivationOS{}); err != nil {
return nil, fmt.Errorf("failed to merge profiles: %w", err)
}
// Clear the registries that may be from other extraction.
e.mavenClient = e.mavenClient.WithoutRegistries()
for _, repo := range project.Repositories {
if err := e.MavenRegistryAPIClient.AddRegistry(datasource.MavenRegistry{
if err := e.mavenClient.AddRegistry(datasource.MavenRegistry{
URL: string(repo.URL),
ID: string(repo.ID),
ReleasesEnabled: repo.Releases.Enabled.Boolean(),
Expand All @@ -82,28 +111,28 @@ func (e Extractor) Extract(ctx context.Context, input *filesystem.ScanInput) ([]
}
}
// Merging parents data by parsing local parent pom.xml or fetching from upstream.
if err := mavenutil.MergeParents(ctx, input, e.MavenRegistryAPIClient, &project, project.Parent, 1, true); err != nil {
if err := mavenutil.MergeParents(ctx, input, e.mavenClient, &project, project.Parent, 1, true); err != nil {
return nil, fmt.Errorf("failed to merge parents: %w", err)
}
// Process the dependencies:
// - dedupe dependencies and dependency management
// - import dependency management
// - fill in missing dependency version requirement
project.ProcessDependencies(func(groupID, artifactID, version maven.String) (maven.DependencyManagement, error) {
return mavenutil.GetDependencyManagement(ctx, e.MavenRegistryAPIClient, groupID, artifactID, version)
return mavenutil.GetDependencyManagement(ctx, e.mavenClient, groupID, artifactID, version)
})

if registries := e.MavenRegistryAPIClient.GetRegistries(); len(registries) > 0 {
if registries := e.mavenClient.GetRegistries(); len(registries) > 0 {
clientRegs := make([]resolution.Registry, len(registries))
for i, reg := range registries {
clientRegs[i] = reg
}
if err := e.DependencyClient.AddRegistries(clientRegs); err != nil {
if err := e.depClient.AddRegistries(clientRegs); err != nil {
return nil, err
}
}

overrideClient := resolution.NewOverrideClient(e.DependencyClient)
overrideClient := resolution.NewOverrideClient(e.depClient)
resolver := mavenresolve.NewResolver(overrideClient)

// Resolve the dependencies.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,10 +242,10 @@ func TestExtractor_Extract(t *testing.T) {
for _, tt := range tests {
t.Run(tt.Name, func(t *testing.T) {
resolutionClient := clienttest.NewMockResolutionClient(t, "testdata/universe/basic-universe.yaml")
extr := pomxmlnet.Extractor{
extr := pomxmlnet.New(pomxmlnet.Config{
DependencyClient: resolutionClient,
MavenRegistryAPIClient: &datasource.MavenRegistryAPIClient{},
}
})

scanInput := extracttest.GenerateScanInputMock(t, tt.InputConfig)
defer extracttest.CloseTestScanInput(t, scanInput)
Expand Down Expand Up @@ -350,10 +350,10 @@ func TestExtractor_Extract_WithMockServer(t *testing.T) {
}

resolutionClient := clienttest.NewMockResolutionClient(t, "testdata/universe/basic-universe.yaml")
extr := pomxmlnet.Extractor{
extr := pomxmlnet.New(pomxmlnet.Config{
DependencyClient: resolutionClient,
MavenRegistryAPIClient: apiClient,
}
})

scanInput := extracttest.GenerateScanInputMock(t, tt.InputConfig)
defer extracttest.CloseTestScanInput(t, scanInput)
Expand Down
17 changes: 15 additions & 2 deletions extractor/filesystem/list/list.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ import (
"github.com/google/osv-scalibr/extractor/filesystem/language/java/gradlelockfile"
"github.com/google/osv-scalibr/extractor/filesystem/language/java/gradleverificationmetadataxml"
"github.com/google/osv-scalibr/extractor/filesystem/language/java/pomxml"
"github.com/google/osv-scalibr/extractor/filesystem/language/java/pomxmlnet"
"github.com/google/osv-scalibr/extractor/filesystem/language/javascript/packagejson"
"github.com/google/osv-scalibr/extractor/filesystem/language/javascript/packagelockjson"
"github.com/google/osv-scalibr/extractor/filesystem/language/javascript/pnpmlock"
Expand Down Expand Up @@ -94,6 +95,14 @@ var (
javaarchive.New(javaarchive.DefaultConfig()),
pomxml.Extractor{},
}
// TODO(#441): enable pomxmlnet extractor when network is accesible.
// JavaNet extractors requiring network access.
JavaNet []filesystem.Extractor = []filesystem.Extractor{
erikvarga marked this conversation as resolved.
Show resolved Hide resolved
gradlelockfile.Extractor{},
gradleverificationmetadataxml.Extractor{},
javaarchive.New(javaarchive.DefaultConfig()),
pomxmlnet.New(pomxmlnet.DefaultConfig()),
}
// Javascript extractors.
Javascript []filesystem.Extractor = []filesystem.Extractor{
packagejson.New(packagejson.DefaultConfig()),
Expand Down Expand Up @@ -174,6 +183,9 @@ var (

// Default extractors that are recommended to be enabled.
Default []filesystem.Extractor = slices.Concat(Java, Javascript, Python, Go, OS)
// DefaultNet defines the list of recommended extractors that require network access.
DefaultNet []filesystem.Extractor = slices.Concat(JavaNet, Javascript, Python, Go, OS)

// All extractors available from SCALIBR.
All []filesystem.Extractor = slices.Concat(
Cpp,
Expand Down Expand Up @@ -219,8 +231,9 @@ var (
"containers": Containers,

// Collections.
"default": Default,
"all": All,
"default": Default,
"defaultnet": DefaultNet,
"all": All,
}
)

Expand Down