Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unpackaged files per SPDX spec #221

Closed
wants to merge 1 commit into from

Conversation

bjamesv
Copy link

@bjamesv bjamesv commented Jun 13, 2022

Add Files without associated packages, as allowed by the spec but not currently supported in Document class.

  • unpackaged Files data model & method (Document.add_file)
  • tag/value Writer, Parser & tests
  • RDF Writer, Parser & tests
  • JSON-yaml-XML Writer, Parser & tests

Also leverages the unpackaged Files data model, to fix rdf parser bug where every referencesFile are incorrectly added to the last-listed Package.

Resolves #181

Signed-off-by: Brandon J. Van Vaerenbergh [email protected]

@bjamesv bjamesv force-pushed the add-unpackaged-files-spdx1 branch 2 times, most recently from 60bf7e6 to ba40678 Compare June 13, 2022 22:16
commit c450e9d
Author: bjv <[email protected]>
Date:   Mon Jun 13 13:41:15 2022 -0700

    change var names & comparison order, for clarity

commit 441301e
Author: bjv <[email protected]>
Date:   Mon Jun 13 13:18:13 2022 -0700

    docs & whitespace cleanup

commit 2794f0f
Author: bjv <[email protected]>
Date:   Mon Jun 13 09:18:04 2022 -0700

    add unpackaged Files tag/value separator

commit b65966a
Author: bjv <[email protected]>
Date:   Mon Jun 13 09:12:28 2022 -0700

    DRY out parser builders, a bit

commit a7a5e6e
Author: bjv <[email protected]>
Date:   Mon Jun 13 08:20:30 2022 -0700

    work around spdx#149

commit f5050b2
Author: bjv <[email protected]>
Date:   Mon Jun 13 07:54:02 2022 -0700

    fix more test RDF data

commit 850d525
Author: bjv <[email protected]>
Date:   Wed Jun 8 10:10:05 2022 -0700

parse rdf package hasFile refs, to distribute doc referencesFile
among packages

commit ca3588f
Author: bjv <[email protected]>
Date:   Fri Jun 3 15:17:13 2022 -0700

(Breaks many tests!) Add unpackaged RDF referencesFile to document
unpackaged_files

    Test assume builder always adds File to the last RDF Package? Seems
    wrong

commit 773ecad
Author: bjv <[email protected]>
Date:   Thu Jun 2 16:44:15 2022 -0700

    Use filetype 'other' for test

commit 6799c30
Author: bjv <[email protected]>
Date:   Thu Jun 2 16:43:50 2022 -0700

    Cleanup hanging licenseref

commit a2c506b
Author: bjv <[email protected]>
Date:   Thu Jun 2 16:07:08 2022 -0700

    Render unpackaged files, for test

commit 83d7b89
Author: bjv <[email protected]>
Date:   Thu Jun 2 15:49:42 2022 -0700

    Cleaned up rdf test

commit 0c633f8
Author: bjv <[email protected]>
Date:   Tue May 31 14:35:02 2022 -0700

    Add rdf parser (partial commit)

commit 9072617
Author: bjv <[email protected]>
Date:   Wed May 25 15:54:05 2022 -0700

    cleanup

commit 11b4502
Author: bjv <[email protected]>
Date:   Wed May 25 15:52:46 2022 -0700

    drop unused flag

commit 6128dea
Author: bjv <[email protected]>
Date:   Wed May 25 15:34:36 2022 -0700

    nicer unpackaged_file validator var name + docs

commit 3acc3dc
Author: bjv <[email protected]>
Date:   Wed May 25 14:52:03 2022 -0700

add unpackaged files to tag/value parser, tests & json-yaml-xml
writer

commit 9314f27
Author: bjv <[email protected]>
Date:   Mon May 23 16:11:30 2022 -0700

    Add unpackaged file info + tag/value writer

Signed-off-by: Brandon J. Van Vaerenbergh <[email protected]>
@bjamesv bjamesv force-pushed the add-unpackaged-files-spdx1 branch from ba40678 to b56b842 Compare July 16, 2022 16:38
@bjamesv
Copy link
Author

bjamesv commented Aug 3, 2022

can we get review/merge going for this, please?

has been very helpful generating SPDX with our tools-python fork

@bjamesv
Copy link
Author

bjamesv commented Aug 3, 2022

PR motivation, in part derived from this 2017 kestewart response:

Screen Shot 2022-08-03 at 10 00 42 AM
https://lists.spdx.org/g/spdx/topic/22079688#1127

Happy to discuss! 🙇

Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline are a couple of review comments.

There was a change in the RDF spec in 2.1 that impacts how packages and files are referenced. It would be great if you could update the PR to use the documentDescribes relationship for both files and packages. See spec issues 543 and Pull Request 38 comment for context.

The current examples in the SPDX spec have a file outside the SPDX document and would make a good test.

@@ -1299,6 +1309,11 @@ def parse(self, fil):
):
self.handle_extracted_license(s)

for s, _p, o in self.graph.triples(
(URIRef(self.doc.spdx_id), self.spdx_namespace["referencesFile"], None)
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: the referencesFile is planned to be deprecated replaced by relationship from the Spdx document to the file with the type relationshipType_describes. There may be some documents out there still using the referencesFile, so this should remain but on writing, it should be converted to a relationship and the referencesFile should be removed. I'll look for this when reviewing the rest of the changes.

else:
raise OrderError("File::Name")
# Starting with SPDX 2.0, file entries may proceed any package information section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - proceed -> preceed

@@ -34,6 +34,7 @@
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test with the latest SPDX example files

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @goneall!

Checking out the referenced example files

@goneall
Copy link
Member

goneall commented Aug 3, 2022

Thanks @bjamesv for the extensive PR! Very much appreciated.

I did a review primarily for the spec - I'm not much of a Python programmer, so I won't be much help on the actual code.

I left a few requested changes in the review. Let let me know if you need any additional information or context.

@bjamesv
Copy link
Author

bjamesv commented Aug 5, 2022

Testing against latest example files is showing me 3-4 current tools-python oddities with parsing annotations, license info, etc. which don't seem like they would be too bad to include in this PR. 👍 Might help incrementally resolve #196 too.

... It would be great if you could update the PR to use the documentDescribes relationship for both files and packages. See spec issues 543 and Pull Request 38 comment for context.

This correction was proposed in Nov. 2021 PR #197 (Issue #184). @goneall can you rebase #197 on main HEAD and see if the actions pass/review can be concluded?

Issues are certainly adjacent, but I hesitate to incorporate those fixes into this PR. I would happily refactor this PR so it works on top of #197 if that was merged, though.

@goneall
Copy link
Member

goneall commented Aug 5, 2022

This correction was proposed in Nov. 2021 PR #197 (Issue #184). @goneall can you rebase #197 on main HEAD and see if the actions pass/review can be concluded?

I just requested @dholth to rebase his PR. If it passes CircleCI, I'll merge that into main.

@meretp meretp added the blocked Blocked by some other PR or issue. Details should be mentioned in comments. label Oct 26, 2022
@nicoweidner nicoweidner removed the blocked Blocked by some other PR or issue. Details should be mentioned in comments. label Nov 4, 2022
@nicoweidner
Copy link
Collaborator

@bjamesv The fixed version of #197 was merged into main by now.

I have to admit that I am quite skeptical of the changes in this PR, though. Clearly, the data model has to be changed to allow for files outside of a package, but I would like to do the change in a way that aligns the data model with the spec. Concretely: Instead of having files both on the document level ("unpackaged files" in this PR) and on the package level, only include them on the document level. If a file is included in a package, that should be represented by a CONTAINS relationship.
(At least this is my understanding, @goneall correct me if I'm wrong)

Removing files from the package level may be quite a bit of additional work. @bjamesv do you want to continue with this?
We are also planning to do a major redesign of the data model after a release of the current state, that would also include this change.

@nicoweidner
Copy link
Collaborator

As there doesn't seem to be any more activity on this PR and the issue of removing the nested files in packages is still outstanding, I will close this PR. Work along these lines will continue in #288.

Sorry this got ignored for so long, @bjamesv !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Files without an associated Package allowed by spec, but not supported in Document class
4 participants