Skip to content

Commit

Permalink
formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
Whattabatt committed Oct 2, 2024
1 parent 9858712 commit 259c4d9
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 5 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "dolma"
version = "1.0.15"
version = "1.1.0"
edition = "2021"
license = "Apache-2.0"

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dolma"
version = "1.0.15"
version = "1.1.0"
description = "Data filters"
license = { text = "Apache-2.0" }
readme = "README.md"
Expand Down
7 changes: 4 additions & 3 deletions python/dolma/warc/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,9 +134,10 @@ def process_single(
extension = extension.replace(".gz", "").replace(".warc", "") + ".jsonl.gz"
destination_path = join_path(prot, *base_dst[:-1], base_dst[-1] + extension)

with smart_open.open(source_path, "rb") as warc_file, smart_open.open(
destination_path, "wb"
) as output_file:
with (
smart_open.open(source_path, "rb") as warc_file,
smart_open.open(destination_path, "wb") as output_file,
):
it = ArchiveIterator(warc_file, record_types=WarcRecordType.response | WarcRecordType.warcinfo)
for record in it:
if record.record_type == WarcRecordType.warcinfo:
Expand Down

0 comments on commit 259c4d9

Please sign in to comment.