Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Customizing special tokens #478

Closed
T145 opened this issue Jan 2, 2025 · 0 comments
Closed

[Question] Customizing special tokens #478

T145 opened this issue Jan 2, 2025 · 0 comments

Comments

@T145
Copy link
Contributor

T145 commented Jan 2, 2025

Let's say models A and B have their pad_token set to <|finetune_right_pad_id|>, and model C has theirs as <|end_of_text|>. I'd like for model C to have the same pad_token.

The MergeKit README has this example:

tokenizer:
  source: union
  tokens:
    # Use embedding from a specific model
    <|im_start|>:
      source: "path/to/chatml/model"

    # Force a specific embedding for all models
    <|special|>:
      source: "path/to/model"
      force: true

    # Map a token to another model's token embedding
    <|renamed_token|>:
      source:
        kind: "model_token"
        model: "path/to/model"
        token: "<|original_token|>"  # or use token_id: 1234

Which I'd interpret as this:

tokenizer:
  source: union
  tokens:
    # Force a specific embedding for all models
    <|finetune_right_pad_id|>:
      source: "A"
      force: true

    # Map a token to another model's token embedding
    <|end_of_text|>:
      source:
        kind: "model_token"
        model: "A"
        token: "<|finetune_right_pad_id|>" 

Is that the right approach?

@T145 T145 closed this as completed Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant