Skip to content

Commit

Permalink
datasets: use filename based on filename; not content
Browse files Browse the repository at this point in the history
By using a hash of the content, a new file was created everytime the
dataset was updated and never cleaned up. To address this, use a
filename that doesn't change based on the content.

Bug: #6763
  • Loading branch information
jasonish committed Mar 5, 2024
1 parent 712c2d4 commit 935d361
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
instead of 4.0.0.
- Handle URLs of bare files that don't end in .rules:
https://redmine.openinfosecfoundation.org/issues/3664
- Don't base dataset filenames on the contents of the file, but
instead the filename path:
https://redmine.openinfosecfoundation.org/issues/6763

## 1.3.0 - 2023-07-07

Expand Down
6 changes: 3 additions & 3 deletions suricata/update/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,9 +465,9 @@ def handle_dataset_files(rule, dep_files):
return
dataset_contents = dep_files[source_filename]

content_hash = hashlib.md5(dataset_contents).hexdigest()
new_rule = re.sub(r"(dataset.*?load\s+){}".format(dataset_filename), r"\g<1>datasets/{}".format(content_hash), rule.format())
dest_filename = os.path.join(config.get_output_dir(), "datasets", content_hash)
source_filename_hash = hashlib.md5(source_filename.encode()).hexdigest()
new_rule = re.sub(r"(dataset.*?load\s+){}".format(dataset_filename), r"\g<1>datasets/{}".format(source_filename_hash), rule.format())
dest_filename = os.path.join(config.get_output_dir(), "datasets", source_filename_hash)
dest_dir = os.path.dirname(dest_filename)
logger.debug("Copying dataset file {} to {}".format(dataset_filename, dest_filename))
try:
Expand Down

0 comments on commit 935d361

Please sign in to comment.