Skip to content

Commit

Permalink
Initialize pipelines for more doc types
Browse files Browse the repository at this point in the history
Signed-off-by: Aakanksha Duggal <[email protected]>
  • Loading branch information
aakankshaduggal committed Jan 20, 2025
1 parent 0b91a3f commit 1484355
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion src/instructlab/sdg/utils/chunkers.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,9 @@ def chunk_documents(self) -> List:
docling_json_paths = list(docling_artifacts_path.glob("*.json"))
chunks = []
for json_fp in docling_json_paths:
chunks.extend(self._process_parsed_docling_json(json_fp))
with json_fp.open("r", encoding="utf-8") as file:
data = json.load(file)
chunks.extend(self._process_parsed_docling_json(data))

return chunks

Expand Down

0 comments on commit 1484355

Please sign in to comment.