Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving the state of the model when exiting the program? #1240

Open
Vladonai opened this issue Nov 27, 2024 · 6 comments
Open

Saving the state of the model when exiting the program? #1240

Vladonai opened this issue Nov 27, 2024 · 6 comments

Comments

@Vladonai
Copy link

Is there now a possibility to add this feature to the program: when exiting the program (or pressing CRTL+C, as I usually do), if the -savestate key was used at startup, then save the state of the model, so that the next startup does not recalculate the KV-cache, but just load it from disk?

I have 4 Tesla P40's, using the 123B_4KM model and a 24k context. I really need this feature!

@jojorne
Copy link

jojorne commented Nov 29, 2024

Every user that uses the same server ends up killing the cache, right? It is recalculated from scratch for each request because the context changes from person to person. Keeping the cache of all users in memory would end up using a lot of memory. But, having a computer with little memory, I understand the desire to save the cache. I can only imagine the time it takes with this model.

@LostRuins
Copy link
Owner

Yes, the cache will be rewound back to the first diverging token. But normally prompt processing is fast enough that you won't realize it. Jojorne is right that multiple cache saving is not really feasible for a server application.

@Vladonai
Copy link
Author

Kobold is used as backend, Silly Tavern as frontend. Model 123B_Q4KM, 24k contexts. In the last session, the context in Tavern was completely filled in. Let's start a new session and pick up where we left off:
24k
If instead of recalculating the entire context over again, the context cache were loaded along with the model, this time would be significantly reduced.

@LostRuins
Copy link
Owner

The kv cache for 123B will be many gigabytes too though.

@Vladonai
Copy link
Author

Vladonai commented Dec 1, 2024

The kv cache for 123B will be many gigabytes too though.

I understand that, and I accept it.

@MrReplikant
Copy link

Just making it known here, I too would like to see this implemented, if for no other reason than to overcome the long startup times like the one pictured above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants