-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deep_speed initialization for models in the transformers library #85
Comments
Hi @DesperateExplorer , Collie can use models from transformers, in the case of ZeRO parallelism. But you need to execute from collie import setup_distribution, CollieConfig
from transformers import AutoModelForCausalLM
model_name = "openlm-research/open_llama_7b_v2"
config = CollieConfig.from_pretrianed(model_name)
setup_distribution(config)
model = AutoModelForCausalLM.from_pretrained(model_name) |
Why is the memory consumption of the LLaMA-7B from |
Collie's LLaMA used |
Actually, not. On V100 (Volta architecture), any kind of |
Dear authors,
I found that collie can not initialize DeepSpeed when using models in the transformers library. For example, when replace this line of script with the
from_pretrained
interface of thetransformers
library, to which any config of the typeCollieConfig
can not be passed, even the monitors can not be registered correctly since ds is not initialized (DeepSpeed backend not set, please initialize it using init_process_group()
). Is there any workaround of this issue or Collie can only support training the internally reimplemented models?The text was updated successfully, but these errors were encountered: