Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seq2Seq: Special tokens are also added to targets for LL computation #149

Open
samsontmr opened this issue Mar 2, 2023 · 2 comments
Open
Assignees

Comments

@samsontmr
Copy link
Member

Location: https://github.com/bigscience-workshop/lm-evaluation-harness/blob/master/lm_eval/models/huggingface.py#L460
@jon-tow I'm not sure if special tokens should be included as part of the target sequence when doing the LL computation.

@jon-tow
Copy link
Collaborator

jon-tow commented Mar 2, 2023

Yeah, that's reasonable. What effect does skipping special (label) tokens have on the logits?

outputs = self._model_call(inputs=inputs_tokens, labels=targets_tokens)

Is there a way to identify special token positions so that we can strip them after the model calls and then only compare logits of non-special ones here:
log_softmax = log_softmax[:length]
target_tokens = target_tokens[:length]
greedy_tokens = log_softmax.argmax(dim=-1)
max_equal = (greedy_tokens == target_tokens).all()
target_logits = torch.gather(
log_softmax, 1, target_tokens.unsqueeze(-1)
).squeeze(-1)
answer = (float(target_logits.sum()), bool(max_equal))

@samsontmr
Copy link
Member Author

What effect does skipping special (label) tokens have on the logits?

It prevents the logits of the special tokens from being included in the computation since they might have different values when next to different tokens.

Seems like a better solution would be to have a separate encode function that never adds special tokens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants