You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What should I do if I want to use tensor_parallel for a GPTQ quantized model(Llama-2-7b-Chat-GPTQ for examlpe) to inference on 2 or more GPUs?
Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: TypeError: cannot pickle 'module' object
Do you have any suggentions on this? Thanks.
The text was updated successfully, but these errors were encountered:
What should I do if I want to use tensor_parallel for a GPTQ quantized model(Llama-2-7b-Chat-GPTQ for examlpe) to inference on 2 or more GPUs?
Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: TypeError: cannot pickle 'module' object
Do you have any suggentions on this? Thanks.
The text was updated successfully, but these errors were encountered: