-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REQUEST] Alternative way to the Pytorch environment variables on Windows to set Pytorch memory management parameters #664
Comments
Setting environment variables in Windows can be done using both Using
|
Hey Doc, I tried all possible syntaxes for PYTORCH_CUDA_ALLOC_CONF expandable_segments:true Both system-wide and as user. Always same answer :
So I tried to modify start.py in Tabby API to be sure of having a correct syntax.
And finally, the load crashes :
I searched for that error, and it seems to be quite common with Pytorch recently. That comment is interesting : pytorch/pytorch#122057 (comment) @galv I am not explicitly setting TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK=true. Expandable segments simply stopped working in PyTorch 2.2 due to the refactor https://github.com/pytorch/pytorch/blob/main/c10/cuda/CUDAAllocatorConfig.h#L28. PyTorch 2.1.2 is the last version that works for me with expandable segments -- upgrading to 2.2+ gives this warning and expandable segments are not enabled (and I get OOMs). |
I’d have to try loading with the actual env var set to see if there’s an issue with the syntax - I only tested being able to set the var and echo it back. Regarding the final error at the bottom - this is just a simple OOM error. Are you sure you aren’t just running out of memory with your configuration? The expandable segments thing only saves a small amount of vram anyways. I notice that you are trying to manually specify a cache size of 10240, however it is being automatically overridden to match max seq len at 131072 because a cache size less than max seq len is not a sane setting. Did you mean to load the model with only 10240 context to take up less vram? |
I made work the model several time on TabbyAPI, 3.9bpw yesterday, then, it stopped to work with the usual OOMs, even after reboot, so I decided to dig in the problem. OH, LOL. My mistake, I just read about the base context. I deleted the max seq len yesterday. Lololol. Still keeps a problem with expandable segment, but.. :D
I thought the sole prompt cache dictated the context size and afferent cache, not the max seq length. I deleted it, then forgot about it.. I'm gonna test right now. I have to reinstall Torch as well, because I messed up my install. I'll notify you as soon as it works. And it works again. x) P.S : Thank you for your help! |
Yeah don’t use override_base_seq_len. This is a very old feature added for a very niche reason - it’s for setting the model’s effective seq len to use for automatic rope scaling (I.e. the oldest mistral 7b having a max seq len of 32k but really only working up to around 8-9k before breaking down). |
Problem
The sign "=" is not supported in Windows environment variables.
Thus, PYTORCH_CUDA_ALLOC_CONF=expandable_segments cannot be used on that platform.
Solution
Could you please either give me an alternative route I might have overlooked, or if possible, alllow to set Pytorch memory parameters through the config.yaml of TabbyAPI?
Alternatives
No response
Explanation
To allow a better compatibility with Windows, Pytorch memory management being always a bit tricky there.
Examples
Here's my current log of TabbyAPI
I'm sorry if I post in the wrong place, but because it's Pytorch AND exllamav2 related, posting here seemed sensical.
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: