Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occurrence line numbers are off #1670

Open
nscuro opened this issue Mar 4, 2025 · 9 comments
Open

Occurrence line numbers are off #1670

nscuro opened this issue Mar 4, 2025 · 9 comments

Comments

@nscuro
Copy link
Member

nscuro commented Mar 4, 2025

Running cdxgen with research profile on the hyades-apiserver repository yields occurrences with incorrect line numbers.

git clone https://github.com/DependencyTrack/hyades-apiserver.git cdxgen-hyades-apiserver
cd cdxgen-hyades-apiserver

# For reproducability. This is the latest commit as of creation of this issue.
git reset --hard cf2744a829bf97d61fe42c80a019d52e5fb56098

docker run --rm \
  -e CDXGEN_DEBUG_MODE=debug \
  -v /tmp:/tmp \
  -v $(pwd):/app:rw \
  --pull always \
  -t ghcr.io/cyclonedx/cdxgen:master \
  -o /app/bom.json -t java --profile research . -p

jq '.' bom.json > bom-formatted.json

Result: bom-formatted.json

For the component alpine-common, the first 3 occurrences listed are:

{
  "location": "src/main/java/org/dependencytrack/auth/Permissions.java#36"
},
{
  "location": "src/main/java/org/dependencytrack/common/ClusterInfo.java#91"
},
{
  "location": "src/main/java/org/dependencytrack/common/HttpClientPool.java#200"
}

Perhaps there is some sort of offset which is miscalculated?

@prabhu
Copy link
Collaborator

prabhu commented Mar 4, 2025

Interesting. Something is terribly wrong somewhere. There are no properties named "Namespaces", so I would expect no occurrence evidences for these. cdx:bom:componentNamespaces appears repeated under metadata.component. Let me investigate.

@nscuro
Copy link
Member Author

nscuro commented Mar 4, 2025

A BOM I generated yesterday had similar oddities, although I am failing to reproduce it now. I used the same cdxgen command as above, but it was run on the repository in a state where the project was already built (i.e. generated-sources populated and JAR in target/ directory).

bom-formatted2.json

What's interesting in this one is that it includes occurrences in target/generated-sources, for example:

{
  "location": "target/generated-sources/org/cyclonedx/proto/v1_6/Advisory.java#124"
}

The kafka-clients component has many occurrences assigned to it that are not correct, and amount to multiple thousands.

Again, I'm failing to reproduce this now, but perhaps sharing the BOM helps.

@prabhu
Copy link
Collaborator

prabhu commented Mar 4, 2025

@nscuro It reuses app.atom and slices json files for performance reasons. Regarding excluding directories for evidence purposes, it's a feature that needs to be added. There are some default excludes in atom, but looks like generated-sources is not there.

@nscuro
Copy link
Member Author

nscuro commented Mar 4, 2025

Ah, that makes sense. Ok so when I nuke the atom and slices files, the linked BOM is reproducible. Should I raise a separate issue for the erroneous occurrence assignments, e.g. on kafka-clients?

@prabhu
Copy link
Collaborator

prabhu commented Mar 4, 2025

No problem. Looking into this issue now. Thank you so much for checking!

@prabhu
Copy link
Collaborator

prabhu commented Mar 5, 2025

@nscuro Thank you so much for flagging this issue.

  1. Regarding Enums appearing in occurrence evidence for alpine-common.

This is correct behaviour! Those enums and internal types are used in annotations and logging functions in the code base. If there is a vulnerability in alpine-common, we need to track all the internal types that are passing untainted via those external libraries. While occurrence evidence is comprehensive (over-tainted), reachables slices is precise (see attached zip).

  • src/main/java/org/dependencytrack/auth/Permissions.java#36 - This file and line number are correct.
  • src/main/java/org/dependencytrack/common/ClusterInfo.java#91 - The file is correct, but line number is wrong. The line number 91 belongs to a different file and method. However, line 63 in ClusterInfo is identical. Config.isUnitTestsEnabled(). I'm suspecting a bug in evinser.js that is getting the line number incorrect. Will keep this issue open and continue investigating.
  1. Regarding generated-sources for kafka-clients. This is fixed with this atom PR.

Noticed that you were running --profile research via docker. Unfortunately, hyades is a large codebase so atom needs to be invoked in java mode to get reachables slices working (callstack evidence). Correct usage of evinse is as follows (4 steps):

git clone https://github.com/DependencyTrack/hyades-apiserver.git cdxgen-hyades-apiserver
cd cdxgen-hyades-apiserver
git reset --hard cf2744a829bf97d61fe42c80a019d52e5fb56098
mvn compile

# Run cdxgen in deep mode.
cdxgen -t java --deep -o bom.json $(pwd)

# Run atom in java mode
# During reachables slicing, bom.json would get used to compute the purls
# When we run reachables slicing first, app.atom would include data dependencies.
# Such a rich atom file is useful for both reachables and usages slicing.
<atom dir>/atom.sh -J-Xmx24G reachables -l java -o app.atom -s reachables.slices.json $(pwd)

# Usages slicing will be faster since the app.atom file would get reused.
<atom dir>/atom.sh -J-Xmx24G usages -l java -o app.atom -s usages.slices.json $(pwd)

# Now run evinse
evinse -l java -i bom.json -o bom.evinse.json

--profile research tries to automate these four steps. However, I have not figured out why the performance of atom is much slower when invoked from node.js processes. (Technically cdxgen node, invoking atom node, invoking java).

Please review the attached zip file and let me know your thoughts.

Archive.zip

@nscuro
Copy link
Member Author

nscuro commented Mar 5, 2025

Thanks @prabhu, appreciate the thorough response.

  1. Regarding Enums appearing in occurrence evidence for alpine-common.

This is correct behaviour! Those enums and internal types are used in annotations and logging functions in the code base.

That makes sense, however I believe it can be confusing when presented this way in occurrences (as seen by my own misunderstanding). Here I'd much rather expect "obvious", coarser usages of the library in question, perhaps even limited to one occurrence per file. As you say, reachables (or callstack info in CycloneDX lingo) are where the detail is / should be. I am a total noob in this area so please take what I say with a big grain of salt.

Unfortunately, hyades is a large codebase so atom needs to be invoked in java mode to get reachables slices working (callstack evidence).

Ah, classic case of RTFM! This is on me, I should've checked the docs first. Thanks for the hint, I'll give this a try.

I have not figured out why the performance of atom is much slower when invoked from node.js processes.

Is this problem just about slowness, or will it plainly not work using the --profile research way?

@prabhu
Copy link
Collaborator

prabhu commented Mar 5, 2025

@nscuro, it's a good feedback. Usages slices specification is designed with semantics in mind. However, translating that rich structure into a single array for occurrences evidence limits its potential, but serves some higher level use-cases such as identifying a heat-map or training ML models that have limited context window, and so on.

The java-node situation definitely could be improved. Just needs someone to run debuggers and investigate why the java process is unable to utilize all cores and threads. There is a default timeout of 10 minutes, so often reachables slicing doesn't finish, requiring this 4-step work around.

@prabhu prabhu mentioned this issue Mar 5, 2025
@prabhu
Copy link
Collaborator

prabhu commented Mar 5, 2025

With PR #1672, I have improved the line numbers. It's gotten a bit coarse now for Java and Jar types, so should lead to less confusions around line numbers at least.

"occurrences": [
                    {
                        "location": "src/main/java/org/dependencytrack/auth/Permissions.java#70"
                    },
                    {
                        "location": "src/main/java/org/dependencytrack/common/ClusterInfo.java#62"
                    },
                    {
                        "location": "src/main/java/org/dependencytrack/common/HttpClientPool.java#123"
                    },
                    {
                        "location": "src/main/java/org/dependencytrack/common/HttpClientPool.java#86"
                    },

I hope this helps!

bom-full.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants