Skip to content

Commit

Permalink
FIX: Fix egregious memory usage while hashing
Browse files Browse the repository at this point in the history
  • Loading branch information
nmacholl committed Oct 9, 2024
1 parent 1ad0128 commit faa1e89
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## 0.43.1 - TBD

#### Bug fixes
- Fixed an issue where validating the checksum of a batch file loaded the entire file into memory

## 0.43.0 - 2024-10-09

This release drops support for Python 3.8 which has reached end-of-life.
Expand Down
6 changes: 5 additions & 1 deletion databento/historical/api/batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,11 @@ def _download_batch_file(
hash_algo, _, hash_hex = batch_download_file.hash_str.partition(":")

if hash_algo == "sha256":
output_hash = hashlib.sha256(output_path.read_bytes())
output_hash = hashlib.new(hash_algo)
with open(output_path, "rb") as fd:
while chunk := fd.read(32_000_000):
output_hash.update(chunk)

if output_hash.hexdigest() != hash_hex:
warn_msg = f"Downloaded file failed checksum validation: {output_path.name}"
logger.warning(warn_msg)
Expand Down

0 comments on commit faa1e89

Please sign in to comment.