Skip to content

Commit

Permalink
fix: use real hf tokenizer vocab size when adding new trainable token
Browse files Browse the repository at this point in the history
  • Loading branch information
percevalw committed Dec 4, 2023
1 parent 7a1b075 commit 3a632d4
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion edsnlp/pipes/trainable/embeddings/transformer/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,9 @@ def __init__(
)
)
# and add a new entry to the model's embeddings
self.transformer.resize_token_embeddings(len(self.tokenizer))
self.transformer.resize_token_embeddings(
max(self.tokenizer.vocab.values()) + 1
)

def to_disk(self, path, *, exclude: Optional[Set[str]]):
repr_id = object.__repr__(self)
Expand Down

0 comments on commit 3a632d4

Please sign in to comment.