You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this very interesting lib @BlackSamorez! I just tried running your kaggle nb locally, using 2 x A6000s. I used the 'meta-llama/Llama-2-7b-hf' model. The call to model.generate with max_length=200 takes 12 seconds when using tensor_parallel with the 2 GPUs.
However, if I remove tensor_parallel and instead just use a single GPU, generation is over twice as fast, taking 5 seconds.
Is this slowdown expected, or am I doing something wrong?
The text was updated successfully, but these errors were encountered:
Thanks for this very interesting lib @BlackSamorez! I just tried running your kaggle nb locally, using 2 x A6000s. I used the 'meta-llama/Llama-2-7b-hf' model. The call to
model.generate
withmax_length=200
takes 12 seconds when usingtensor_parallel
with the 2 GPUs.However, if I remove
tensor_parallel
and instead just use a single GPU, generation is over twice as fast, taking 5 seconds.Is this slowdown expected, or am I doing something wrong?
The text was updated successfully, but these errors were encountered: