Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore/make upload more robust #92

Merged
merged 4 commits into from
Nov 2, 2024

Conversation

nzupan
Copy link
Contributor

@nzupan nzupan commented Oct 31, 2024

This pull requests extends the granularity list and make the attachment upload more robust to prevent .git files being uploaded.

@tngraf tngraf added the enhancement New feature or request label Nov 2, 2024
@tngraf tngraf merged commit 7a2f139 into sw360:main Nov 2, 2024
7 checks passed
Comment on lines +302 to +303
if data["sourceCodeDownloadurl"].endswith(('zip', 'tgz', 'tar.gz', 'tar')):
update_data["sourceCodeDownloadurl"] = data["sourceCodeDownloadurl"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late to this review, but can somebody please explain me this change? I understand it as "if URL in BOM looks somehow correct, always overwrite URL in SW360". Do we really want this? Or do I get it wrong here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As elaborated in the other comments, the source code detection often doesn't find correct URLs, or the package's metadata isn't correct (my experience so far was that JS projects perform the worst).
As a consequence, I've also observed that there are often existing releases in SW360 that have existing source code attachments with wrong file types (e.g., .git), which wouldn't be corrected by the capycli due to the attachment already existing, even though the capycli would now (with the introduced changes) often find a valid source code download URL.

Yes, the check is broader than overriding just the existing URLs ending with .git (for which this was introduced), but in my testing, I haven't observed negative side effects.

I would gladly hear your opinion. If you think this could introduce negative side effects, we can make it more specific to target clearly wrong source code URLs (e.g., ending with .git).

Copy link
Collaborator

@gernot-h gernot-h Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tested it now and it will now always update existing source URLs given that the URL in the BOM ends with .tar.gz oder .zip. But probably the existing entry in SW360 is more correct than my BOM? Apart from that, the user would probably not expect this.

In Debian, for example, I found that many solutions generate somehow wrong source URLs, e.g. pointing to orig.tar.gz or orig.tar.xz packages. So in some cases, existing, valid URLs would be replaced by invalid ones, in others not (because .tar.gz is on the whitelist, .tar.xz is not).

Apart from that, in Debian, we use permanent URLs like https://snapshot.debian.org/archive/debian/<timestamp>/..., so nearly every user will have a different URL because same package can be found in different snapshots. So this could trigger lots of unnecessary changes and thus moderation requests when handling large BOMs. (Actually, it would not for my concrete example, as correct URLs end with .tsc, only incorrect ones would cause this...).

I think the code should at least be narrowed to update only URLs you consider invalid, but I'm really not sure if hard-coding a fix for a specific JS issue in the general code paths is a good idea. I personally never stumbled over such wrong entries in other SW360 entries. So my suggestion would be to roll-back this change, open an issue about your specific problem and discuss possible solutions there.

Copy link
Collaborator

@gernot-h gernot-h Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I just created #96 so we have a testcase for these changes and we can shape it according to a decision how to handle your issue. We might think about adding test cases for the other changes introduced here, too.

Comment on lines +371 to +373
if filename is not None and filename.endswith('.git'):
print_red(" WARNING: resetting filename to prevent uploading .git file")
filename = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely against this, but do we really need this extra check? There are plenty more of extensions we probably don't want to upload like .exe, .scr, .pif, .so, ... Is there an important reason to hard-code this extension? Probably someone one day has a valid use case to upload "snapshot.git"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gernot-h,
all of these are valid questions and @nzupan should provide answers.
What I know: sometimes the source code detection or source code upload do not work as inteded.
The result is an attachment on SW360 that cannot be used, because it does not contain source code.
My assumption was that thesecode changes mitigate this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was exactly the intention behind the change. What I've observed in practice is that on many javascript packages, the .git is being uploaded instead of the source code archive file due to the source code detection not working as intended or the package's metadata being wrong.
Due to the JS projects often containing many dependencies, preventing uploading .git files can prevent a huge amount of manual work or in the worst case wrong uploads/moderation requests (if not enough care is put in by the dev teams when polishing SBOM files).
@gernot-h , but what you pointed out is a valid use case. I believe that a good way to improve this check is to apply it explicitly only for the SOURCE AND BINARY file type.
If you both agree, I would open another merge request for it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed that point with @tngraf and @nzupan in a meeting and we agreed to keep this hard-coded check as this is a frequent problem, which both colleagues have seen repeatedly happening in our internal SW360 instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants