Skip to content

Commit

Permalink
fix(convert_hf_to_gguf): Support setting token_type_count from "type_…
Browse files Browse the repository at this point in the history
…vocab_size"

This matches the key in common bert-based embedding models and may have a
value other than 1 in it.

Branch: XLMRobertaTypeVocabSize

Signed-off-by: Gabe Goodhart <[email protected]>
  • Loading branch information
gabe-l-hart committed Nov 22, 2024
1 parent 6dfcfef commit 757a2d3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2707,7 +2707,7 @@ def set_vocab(self):
self.gguf_writer.add_token_scores(scores)
self.gguf_writer.add_token_types(toktypes)
self.gguf_writer.add_add_space_prefix(add_prefix)
self.gguf_writer.add_token_type_count(1)
self.gguf_writer.add_token_type_count(self.hparams.get("type_vocab_size", 1))
self.gguf_writer.add_remove_extra_whitespaces(remove_whitespaces)
if precompiled_charsmap:
self.gguf_writer.add_precompiled_charsmap(precompiled_charsmap)
Expand Down

0 comments on commit 757a2d3

Please sign in to comment.