-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datasets: fix dataset clobbering, and file naming - v1 #341
Conversation
By using a hash of the content, a new file was created everytime the dataset was updated and never cleaned up. To address this, use a filename that doesn't change based on the content. Bug: #6763
To prevent dataset files from difference sources from overwriting each other, give each file downloaded and extracted a prefix based on the URL (a hash). This ensures unique filenames across all rulesets. This mostly matters for datasets, as when datasets are processed we are working with a merged set of filenames, unlike rules which are parsed much earlier when we still have a list of files. Not the most elegant solution, but saves a rather large refactor. Bug: #6833
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good from an overview. :) But, I haven't tested yet.
Do you already have some test data or should I create some? For manual testing.
This file has a dataset that has the same name as a dataset in the pawpatrules that I used to test the clobbering. To test the extraneous files, you'd have to update, modify the dataset in the ruleset, and update and again and see the new file creation. Then run with patch and see how it no longer happens. This is the more critical one, as some datasets are pretty large and updated daily. |
For example, after a few weeks of rulesets with datasets that update frequently, I have these files in my rules directory:
|
instead the filename path:
https://redmine.openinfosecfoundation.org/issues/6763
with a hash of the URL to prevent duplicate filenames from
cloberring each other, in particular dataset files:
https://redmine.openinfosecfoundation.org/issues/6833