Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition from Ollama to Hugging Face 🤗 #50

Closed
wants to merge 6 commits into from

Conversation

latekvo
Copy link
Owner

@latekvo latekvo commented Jun 23, 2024

🤗

@latekvo
Copy link
Owner Author

latekvo commented Jun 26, 2024

Seems to be working, but requires further testing, and we have to decide what to do with the ollama setup.
I bet there must be a better way than just having a separate compose and containers all together.
We could remove ollama support completely IF - and this is a blocking requirement - if we recreate it's auto-allocation algorithm.
Currently llama takes a fixed amount of memory per config, and uses it to split the provided model and launch it accordingly.
We'd have to somehow calculate how many layers + context we can load at once, and apply it to the llama.cpp loader.
Then, we can confidently remove the inferior bridged solution ;)

@latekvo latekvo marked this pull request as ready for review June 26, 2024 11:45
@latekvo latekvo marked this pull request as draft June 30, 2024 18:04
@latekvo
Copy link
Owner Author

latekvo commented Jun 30, 2024

This modification turned out to require way to much additional work on the user's side, and upkeep of 2 separate systems from our side, as relying solely on bare bones llama.cpp is not as consistant as ollama.

I'm closing for now, but it's still up for consideration in future releases.

@latekvo latekvo closed this Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant