-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for multiple file checksums, file types #200
Conversation
Signed-off-by: Jeffrey Otterson <[email protected]>
Signed-off-by: Jeffrey Otterson <[email protected]>
c3c610d
to
7bbecc0
Compare
I have an overlapping pull request at #197 but I did not implement parsing. Nice to see that you were able to implement parsing for all formats. Were you able to validate the json output against https://github.com/spdx/spdx-spec/blob/development/v2.2.2/schemas/spdx-schema.json ? |
I do not know how to do that. I read the schema, and I would say that I do not believe that my changes are compatible with the schema, but that is because I believe the schema is not compliant with the SPDX spec. I am not confident in my ability to read that schema, but my interpretation of that schema only allows one type for a file, and the specification says the cardinality is 0..* (https://spdx.github.io/spdx-spec/file-information/#83-file-type-field) so ??? What is more correct? Jeff |
I don't know about file type cardinality but there are a couple of more mundane details like the placement of the checksum field and name of algorithms in json, and the presence or absence of certain fields.
|
Thanks for that. I still don't have the schema verifying, but I have fixed some of the schema validation problems (extra 'checksumAlgorithm_' text, etc.) I will merge what I have into my branch soon. I tried to put the schema validation right into the tests, but I think that unless I have your flatten and unflatten functions, that is not going to work right. I will note that I do have read/write for all file formats working. Jeff. |
Signed-off-by: Jeffrey Otterson <[email protected]>
@jotterson Thank you ++ .. let me review in details possibly later this week |
What is the status of this PR? I can assist if needed or I can make a separate PR from work I have done late last year that I believe solves the same issue. |
I would still like to get this PR committed, somebody has to approve it. The changes are useful, and the tests all pass. Jeff. |
@pombredanne is this still under review? |
I sure would like to merge this. What's it going to take? @pombredanne ??? |
Hi @jotterson . Sorry this got sidelined for so long! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your effort! I think these are great additions eventhough it would have been easier to review if the changes were in seperate PRs (one for the multiple checksums, one for file types and for one for the changes concerning xml-parser/writer).
This PR still depends on PR #254 (superseding #197) and there will be some conflicts when rebasing but to speed things up I started the review and added some comments. I have not yet looked closely at the changes for the XML parser and writer but I like the approach. I think that this will go well with the planned refactoring in the direction of separating the builder and parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jotterson, thanks for the big contribution! We definitely want to get this merged, as there are a lot of valuable fixes in this PR.
I left quite a lot of comments; I apologize in advance if some of those comments concern issues that are better fixed separately (I am not super familiar with the current state of the tools yet). If you see cases of this, please feel free to point it out, then we can move them to separate issues.
As this PR is quite old by now and didn't get much attention for a long time, could you give an indication whether you want to continue working on it? Else I would try to make the necessary fixes and finally get it merged.
edit: One more general note: Usually, I would request some more unit tests for added functionality, so we can slowly improve test coverage of the Python tools. However, I feel that many of those tests would become redundant with the planned data model refactoring, so it's probably not worth it spending a lot of time on tests right now.
if isinstance(v[0], dict): | ||
v = sorted(v, key=lambda x: json.dumps(x, sort_keys=True)) | ||
else: | ||
v = sorted(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the logic. Why don't we just call sort_nested(v)
here?
Instead, the sorting used here should be applied to data
in the case where it is a list
del(data['SpdxDocument']['Document']['creationInfo']) | ||
parser = xmlparser.Parser(Builder(), StandardLogger()) | ||
with io.open(location, encoding='utf-8') as f: | ||
document, _ = parser.parse(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should raise an error in case parsing fails?
I'm working on this. There are many requested changes, plus a potential rebase. It's gonna take a few days. If you're in a hurry, I'm about half done, all the tests are currently passing, but the example data is still old. Let me know, and I will make a new commit with the fixes that I have. |
I'd like to get this merged fairly soon, but there is no immediate hurry. I'd say a timeline of about a week would be better than a month. I feel bad creating pressure here, since this PR was ignored for so long, sorry about that! Our plan is to get all outstanding fixes (and maybe a few additional ones) in and do an as-is release, since the last release was like 3 years ago. Then we plan to tackle some larger structural changes to the tools. Also, sorry for the pile of comments, I tend to be very verbose in reviews 😅. I do this with the mindset that it's ok to disagree with comments, I am very rarely dogmatic (some kind of reason should be given though). |
The tag/value parser is going to need a real enema. There are some horrible assumptions in there related to sequencing, package before files, etc. The latest tag/value files from the tools-java project really demonstrate that problem. IMHO the current yacc solution is excessively complicated. |
The tag/value parser is a catastrophe. Configuring the parser via docstrings is honestly the most dubious thing I have seen in my coding life so far. That said, @meretp, @armintaenzertng and myself dealt with it (read: had to deal with it...) recently while adding some new properties. If it's causing you headaches, feel free to skip tag/value for now and we'll try to finish it afterwards. |
json, yaml, xml formats appear to be working. tag/value and rdf is not working yet. updated test/data/formats files from tools-java project.
Here is an update to my commit of 51 weeks ago (!). The CircleCI tests are failing because the tag/value and RDF formats are not working. JSON, YAML, XML all work, spdx data can be converted between those three formats with no apparent loss of fidelity. |
Hi @jotterson. I am afraid I have to step in here, because with your recent massive changes, I see no way to get this PR rebased and merged. It is way too big for a single PR and addresses a variety of issues instead of focusing on a single one (several of those issues are already fixed on In order to get this merged, the PR needs to be broken up into smaller chunks, such as:
With the changes concentrated in very few, massive commits, I don't see a way to easily cherry-pick specific aspects. The latest commit is also missing a signoff, though that is probably the smallest problem. Issues that are already fixed on
Things that shouldn't be changed:
I am happy to help get smaller PRs reviewed quickly so the changes can be merged, but rebasing this PR in the current state will simply not be possible, I'm afraid. |
This change allows multiple checksums and file types on Files. This improves compliance with the SPDX 2.1 and newer specifications.
This was done to allow support for modern cryptographic hashes, e.g. sha256.
All 250 of the tests runs. The multiple file checksum and file types works with all storage types.
Signed-off-by: Jeffrey Otterson [email protected]