Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge the model and store in Google Drive (Section) #15

Open
KabaTubare opened this issue Aug 27, 2023 · 3 comments
Open

Merge the model and store in Google Drive (Section) #15

KabaTubare opened this issue Aug 27, 2023 · 3 comments

Comments

@KabaTubare
Copy link

It always runs out of memory...please remedy this issue. Error I get constantly and I am using Colab Pro V100 which should be enough for this project i think: 0/3 [02:11<?, ?it/s]

OutOfMemoryError Traceback (most recent call last)
in <cell line: 8>()
6
7 # Reload model in FP16 and merge it with LoRA weights
----> 8 base_model = AutoModelForCausalLM.from_pretrained(
9 model_name,
10 low_cpu_mem_usage=True,

4 frames
/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics)
296 module._parameters[tensor_name] = param_cls(new_value, requires_grad=old_value.requires_grad)
297 elif isinstance(value, torch.Tensor):
--> 298 new_value = value.to(device)
299 else:
300 new_value = torch.tensor(value, device=device)

OutOfMemoryError: CUDA out of memory. Tried to allocate 314.00 MiB (GPU 0; 15.77 GiB total capacity; 14.32 GiB already allocated; 2.12 MiB free; 14.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@hieuminh65
Copy link

I have this problem too

@hieuminh65
Copy link

Hey I use V100 and it works, have you turned on high RAM?

After trained the model and got it in the google drive folder, you should restart the runtime and only run cells which run inference. That will save RAM and Storage to load the model

@KabaTubare
Copy link
Author

This still does not work, but I figure eventually the authors of this project or someone else will get it right as there are others coming quickly to fill this turn key solution to llm training. Does not seem the authors relaize that this thread is used by tech firms to find issues, but also see how they are resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants