-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chore/make upload more robust #92
Conversation
if data["sourceCodeDownloadurl"].endswith(('zip', 'tgz', 'tar.gz', 'tar')): | ||
update_data["sourceCodeDownloadurl"] = data["sourceCodeDownloadurl"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late to this review, but can somebody please explain me this change? I understand it as "if URL in BOM looks somehow correct, always overwrite URL in SW360". Do we really want this? Or do I get it wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As elaborated in the other comments, the source code detection often doesn't find correct URLs, or the package's metadata isn't correct (my experience so far was that JS projects perform the worst).
As a consequence, I've also observed that there are often existing releases in SW360 that have existing source code attachments with wrong file types (e.g., .git), which wouldn't be corrected by the capycli due to the attachment already existing, even though the capycli would now (with the introduced changes) often find a valid source code download URL.
Yes, the check is broader than overriding just the existing URLs ending with .git (for which this was introduced), but in my testing, I haven't observed negative side effects.
I would gladly hear your opinion. If you think this could introduce negative side effects, we can make it more specific to target clearly wrong source code URLs (e.g., ending with .git).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tested it now and it will now always update existing source URLs given that the URL in the BOM ends with .tar.gz oder .zip. But probably the existing entry in SW360 is more correct than my BOM? Apart from that, the user would probably not expect this.
In Debian, for example, I found that many solutions generate somehow wrong source URLs, e.g. pointing to orig.tar.gz or orig.tar.xz packages. So in some cases, existing, valid URLs would be replaced by invalid ones, in others not (because .tar.gz is on the whitelist, .tar.xz is not).
Apart from that, in Debian, we use permanent URLs like https://snapshot.debian.org/archive/debian/<timestamp>/...
, so nearly every user will have a different URL because same package can be found in different snapshots. So this could trigger lots of unnecessary changes and thus moderation requests when handling large BOMs. (Actually, it would not for my concrete example, as correct URLs end with .tsc, only incorrect ones would cause this...).
I think the code should at least be narrowed to update only URLs you consider invalid, but I'm really not sure if hard-coding a fix for a specific JS issue in the general code paths is a good idea. I personally never stumbled over such wrong entries in other SW360 entries. So my suggestion would be to roll-back this change, open an issue about your specific problem and discuss possible solutions there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I just created #96 so we have a testcase for these changes and we can shape it according to a decision how to handle your issue. We might think about adding test cases for the other changes introduced here, too.
if filename is not None and filename.endswith('.git'): | ||
print_red(" WARNING: resetting filename to prevent uploading .git file") | ||
filename = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely against this, but do we really need this extra check? There are plenty more of extensions we probably don't want to upload like .exe, .scr, .pif, .so, ... Is there an important reason to hard-code this extension? Probably someone one day has a valid use case to upload "snapshot.git"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gernot-h,
all of these are valid questions and @nzupan should provide answers.
What I know: sometimes the source code detection or source code upload do not work as inteded.
The result is an attachment on SW360 that cannot be used, because it does not contain source code.
My assumption was that thesecode changes mitigate this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was exactly the intention behind the change. What I've observed in practice is that on many javascript packages, the .git is being uploaded instead of the source code archive file due to the source code detection not working as intended or the package's metadata being wrong.
Due to the JS projects often containing many dependencies, preventing uploading .git files can prevent a huge amount of manual work or in the worst case wrong uploads/moderation requests (if not enough care is put in by the dev teams when polishing SBOM files).
@gernot-h , but what you pointed out is a valid use case. I believe that a good way to improve this check is to apply it explicitly only for the SOURCE AND BINARY file type.
If you both agree, I would open another merge request for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull requests extends the granularity list and make the attachment upload more robust to prevent .git files being uploaded.