Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在使用 train_lora.py训练时出现了BertTokenizerFast has no attribute pad_token问题 #35

Open
yuanyuan1-max opened this issue Feb 4, 2025 · 0 comments

Comments

@yuanyuan1-max
Copy link

您好,我在使用train训练时候,出现了题目上的问题,
Name: transformers Version: 4.48.2
报错如下
Traceback (most recent call last): File "/mnt/e/iwork/AI_voide/MassTTS-main/ChatTTSPlus/train_lora.py", line 542, in <module> main(config) File "/mnt/e/iwork/AI_voide/MassTTS-main/ChatTTSPlus/train_lora.py", line 347, in main for step, batch in enumerate(train_dataloader): File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/accelerate/data_loader.py", line 563, in __iter__ current_batch = next(dataloader_iter) File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 701, in __next__ data = self._next_data() File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1465, in _next_data return self._process_data(data) File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data data.reraise() File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/_utils.py", line 715, in reraise raise exception AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/e/iwork/AI_voide/MassTTS-main/ChatTTSPlus/chattts_plus/datasets/base_dataset.py", line 62, in __getitem__ text_input_ids, text_mask = self.preprocess_text(data_info_["text"], data_info_["lang"]) File "/mnt/e/iwork/AI_voide/MassTTS-main/ChatTTSPlus/chattts_plus/datasets/base_dataset.py", line 90, in preprocess_text input_ids, attention_mask, text_mask = self.tokenizer.encode([text], num_vq=self.num_vq) File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/mnt/e/iwork/AI_voide/MassTTS-main/ChatTTSPlus/chattts_plus/models/tokenizer.py", line 62, in encode x = self._tokenizer.encode_plus( File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3045, in encode_plus padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies( File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2769, in _get_padding_truncation_strategies if padding_strategy != PaddingStrategy.DO_NOT_PAD and (self.pad_token is None or self.pad_token_id < 0): File "/home/yuan/anaconda3/envs/chattts_plus/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1108, in __getattr__ raise AttributeError(f"{self.__class__.__name__} has no attribute {key}") AttributeError: BertTokenizerFast has no attribute pad_token

通过查阅怀疑是tokenizer.py中的分词器的问题,
if self._tokenizer.pad_token is None: self._tokenizer.pad_token = "[PAD]"
强制设定默认值后问题依然没有解决。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant