Skip to content

Commit

Permalink
typo fix in train_tokenizer
Browse files Browse the repository at this point in the history
  • Loading branch information
jettjaniak committed May 23, 2024
1 parent b3899b1 commit 48f2222
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 3 deletions.
1 change: 0 additions & 1 deletion scripts/tokenize_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@
args.out_repo_id or args.out_dir
), "You need to provide --out-repo-id or --out-dir"

print(f"Loading dataset '{args.in_repo_id}'...")
in_dataset_split = utils.load_dataset_split_string_feature(
args.in_repo_id, args.split, args.feature
)
Expand Down
3 changes: 1 addition & 2 deletions scripts/train_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,8 @@ def train_byte_level_bpe(
args.out_repo_id or args.out_dir
), "You need to provide out_repo_id or out_dir"

print(f"Loading dataset '{args.in_repo_id}'...")
in_dataset_split = utils.load_dataset_split_string_feature(
args.repo_id, args.split, args.feature
args.in_repo_id, args.split, args.feature
)
assert isinstance(in_dataset_split, Dataset)
tokenizer = train_byte_level_bpe(
Expand Down

0 comments on commit 48f2222

Please sign in to comment.