Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[issue-246] parse SPDXID for file correctly #247

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions spdx/parsers/tagvaluebuilders.py
Original file line number Diff line number Diff line change
Expand Up @@ -1056,6 +1056,7 @@ def set_file_spdx_id(self, doc, spdx_id):
if self.has_package(doc) and self.has_file(doc):
if not self.file_spdx_id_set:
self.file_spdx_id_set = True
spdx_id = spdx_id.split("#")[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan to disregard the part before #, you have to at least check that this is exactly the same as the current enclosing identifier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I don't think that the validation should be part of this PR. In the current implementation this isn't checked for any other ID, e.g. for the package SPDX ID. I would suggest to open another issue for this validation so that the validation is consistent for all SPDXIDs parsed from a rdf-file. Are you okay with that?
Nevertheless I see the point to transfer this rdf-specific logic to the rdf-builder. I will change this.

Copy link
Member

@zvr zvr Oct 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it to RDF processing is correct, thanks for that.

I still maintain that the namespace should not be blindly discarded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue concerning the validation here #251. It think it makes sense to introduce the valdiation for all values in one PR. Are you okay with that and would you approve this PR with that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with handling the namespace elsewhere, especially if it happens for all cases.

But as I wrote in #251, it's not a validation. It's a check of whether it can be discarded. In some cases (when it is different than the enclosing namespace), it will have to be retained.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see. I didn't know that there is a difference. But this does also apply to the package SPDX ID, right, e.g. here? I will update the open issue.
With that I think it is best to close this PR without merging and I will change the implementation in #254 so that it is independant from this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might happen in any id you encounter, e.g., document id, snippet id, license id, ...

if validations.validate_file_spdx_id(spdx_id):
self.file(doc).spdx_id = spdx_id
return True
Expand Down
1 change: 0 additions & 1 deletion spdx/parsers/validations.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,6 @@ def validate_pkg_lics_comment(value, optional=False):


def validate_file_spdx_id(value, optional=False):
value = value.split("#")[-1]
TEXT_RE = re.compile(r"SPDXRef-([A-Za-z0-9.\-]+)", re.UNICODE)
if value is None:
return optional
Expand Down
4 changes: 2 additions & 2 deletions tests/data/doc_parse/spdx-expected.json
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@
},
"files": [
{
"id": "https://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301#SPDXRef-File1",
"id": "SPDXRef-File1",
"name": "Jenna-2.6.3/jena-2.6.3-sources.jar",
"type": 3,
"comment": "This file belongs to Jena",
Expand Down Expand Up @@ -137,7 +137,7 @@
"artifactOfProjectURI": []
},
{
"id": "https://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301#SPDXRef-File2",
"id": "SPDXRef-File2",
"name": "src/org/spdx/parser/DOAPProject.java",
"type": 1,
"comment": null,
Expand Down