You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's say models A and B have their pad_token set to <|finetune_right_pad_id|>, and model C has theirs as <|end_of_text|>. I'd like for model C to have the same pad_token.
The MergeKit README has this example:
tokenizer:
source: union
tokens:
# Use embedding from a specific model
<|im_start|>:
source: "path/to/chatml/model"
# Force a specific embedding for all models
<|special|>:
source: "path/to/model"
force: true
# Map a token to another model's token embedding
<|renamed_token|>:
source:
kind: "model_token"
model: "path/to/model"
token: "<|original_token|>" # or use token_id: 1234
Which I'd interpret as this:
tokenizer:
source: union
tokens:
# Force a specific embedding for all models
<|finetune_right_pad_id|>:
source: "A"
force: true
# Map a token to another model's token embedding
<|end_of_text|>:
source:
kind: "model_token"
model: "A"
token: "<|finetune_right_pad_id|>"
Is that the right approach?
The text was updated successfully, but these errors were encountered:
Let's say models A and B have their
pad_token
set to<|finetune_right_pad_id|>
, and model C has theirs as<|end_of_text|>
. I'd like for model C to have the samepad_token
.The MergeKit README has this example:
Which I'd interpret as this:
Is that the right approach?
The text was updated successfully, but these errors were encountered: