You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
distil_trainer.py
In this code snippet, for each invocation of generate_one when generating tokens, the past_key_values used are those from either the student_key_values or teacher_key_values that were decided upon when generating the first token, rather than the kv (key-value pairs) returned each time a new token is generated.why not use the new generated kv?
The text was updated successfully, but these errors were encountered:
distil_trainer.py
In this code snippet, for each invocation of generate_one when generating tokens, the past_key_values used are those from either the student_key_values or teacher_key_values that were decided upon when generating the first token, rather than the kv (key-value pairs) returned each time a new token is generated.why not use the new generated kv?
The text was updated successfully, but these errors were encountered: