Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets: fix dataset clobbering, and file naming - v1 #341

Merged
merged 2 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
instead of 4.0.0.
- Handle URLs of bare files that don't end in .rules:
https://redmine.openinfosecfoundation.org/issues/3664
- Don't base dataset filenames on the contents of the file, but
instead the filename path:
https://redmine.openinfosecfoundation.org/issues/6763
- Give each file in a source a unique filename by prefixing the files
with a hash of the URL to prevent duplicate filenames from
cloberring each other, in particular dataset files:
https://redmine.openinfosecfoundation.org/issues/6833

## 1.3.0 - 2023-07-07

Expand Down
13 changes: 9 additions & 4 deletions suricata/update/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,9 +465,9 @@ def handle_dataset_files(rule, dep_files):
return
dataset_contents = dep_files[source_filename]

content_hash = hashlib.md5(dataset_contents).hexdigest()
new_rule = re.sub(r"(dataset.*?load\s+){}".format(dataset_filename), r"\g<1>datasets/{}".format(content_hash), rule.format())
dest_filename = os.path.join(config.get_output_dir(), "datasets", content_hash)
source_filename_hash = hashlib.md5(source_filename.encode()).hexdigest()
new_rule = re.sub(r"(dataset.*?load\s+){}".format(dataset_filename), r"\g<1>datasets/{}".format(source_filename_hash), rule.format())
dest_filename = os.path.join(config.get_output_dir(), "datasets", source_filename_hash)
dest_dir = os.path.dirname(dest_filename)
logger.debug("Copying dataset file {} to {}".format(dataset_filename, dest_filename))
try:
Expand Down Expand Up @@ -985,9 +985,14 @@ def load_sources(suricata_version):
# Now download each URL.
files = []
for url in urls:

# To de-duplicate filenames, add a prefix that is a hash of the URL.
prefix = hashlib.md5(url[0].encode()).hexdigest()
source_files = Fetch().run(url)
for key in source_files:
files.append(SourceFile(key, source_files[key]))
content = source_files[key]
key = format("{}/{}".format(prefix, key))
files.append(SourceFile(key, content))

# Now load local rules.
if config.get("local") is not None:
Expand Down
Loading