You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was recently trying to make a custom sampler (documented on reddit here) and I encountered what seems to be a seriously sharp edge in the high-level API that should at least be documented, if not changed. Long story short: using llm.eval_logits to get the next token logits versus llm.scores leads to an approximately 15x slowdown (17 seconds to get a completion versus over 300 seconds) due to llm.eval_logits calling to_list (code here). This was very surprising to me.
Is there a reason that eval_logits is implemented in such an inefficient way, and essentially duplicates scores? This took me far more debug time than I care to say to find!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I was recently trying to make a custom sampler (documented on reddit here) and I encountered what seems to be a seriously sharp edge in the high-level API that should at least be documented, if not changed. Long story short: using
llm.eval_logits
to get the next token logits versusllm.scores
leads to an approximately 15x slowdown (17 seconds to get a completion versus over 300 seconds) due to llm.eval_logits callingto_list
(code here). This was very surprising to me.Is there a reason that
eval_logits
is implemented in such an inefficient way, and essentially duplicatesscores
? This took me far more debug time than I care to say to find!Beta Was this translation helpful? Give feedback.
All reactions