Skip to content

Commit

Permalink
Update src/modalities/dataloader/shuffle_tokenized_data.py
Browse files Browse the repository at this point in the history
Co-authored-by: Max Lübbering <[email protected]>
  • Loading branch information
mali-git and le1nux authored Jan 21, 2025
1 parent 51f29f8 commit 906a264
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/modalities/dataloader/shuffle_tokenized_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def shuffle_tokenized_data(input_data_path: Path, output_data_path: Path, batch_
random.shuffle(index_base)

# Step 3: Divide the shuffled index into batches
batches = [index_base[i : i + batch_size] for i in range(0, len(index_base), batch_size)]
batches: list[list[tuple[int, int]]] = [index_base[i : i + batch_size] for i in range(0, len(index_base), batch_size)]

header_data = data_section_length_in_bytes + token_size_as_bytes

Expand Down

0 comments on commit 906a264

Please sign in to comment.