Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama-rs integration #156

Merged
merged 6 commits into from
May 25, 2024
Merged

Conversation

erhant
Copy link
Contributor

@erhant erhant commented May 21, 2024

This PR integrates Ollama-rs and feature-gates it with the ollama feature.

  • Re-Implements OllamaEmbedder using the new client, the embedding_ollama.rs example is working.

  • Implements the language_models::llm::LLM trait for Ollama, the llm_ollama.rs example is working.

  • Also allows OpenAI compatible Ollama usage by implementing async_openai::config::Config for OllamaConfig, this part is not feature-gated as it does not require Ollama-rs package.

  • Closes Pull missing model for Ollama LLM generation #148 : Initially we had talked about adding an option to auto-pull a model if it does not exist and such, but with Ollama-rs integrated, we can simply leave that to the user. The pull code is a simple one-liner using the Ollama-rs client, and our tools request the client be created from the outer scope and passed in using Arc. So if one needs to pull a model, they can do it before passing in the model with their own logic (e.g. with retries, timeouts, cancellations)

  • Opens up the way for Add support for OllamaFunctions chat model from the official langchain library #149 as we will basically have access to function calls when the PR: Ollama Function Calling pepperoni21/ollama-rs#51 is merged!

Note

@prabirshrestha Im currently opening this as a draft, because I couldn't implement the stream function for the LLM trait, and would like your opinion / help on that part! Ollama-rs has a streaming call itself, and I just have to map the items of this stream to the one that LLM trait wants, but for some reason I couldn't do it yet.

I will still be working on that, but would love your input if any 🙏🏻

EDIT: Streaming is done.

@prabirshrestha
Copy link
Collaborator

Have you tried using async_stream with yield. There are some usage patterns in the codebase already such as this and this. If you have issues with threading you can use flume which can be found here.

@erhant
Copy link
Contributor Author

erhant commented May 22, 2024

Have you tried using async_stream with yield. There are some usage patterns in the codebase already such as this and this. If you have issues with threading you can use flume which can be found here.

Looking into it, thanks!

@erhant
Copy link
Contributor Author

erhant commented May 22, 2024

I mapped the stream item using map_ok and the error with map_err, and returned resulting stream with Ok(Box::pin(stream)) and it works quite nicely!

Just one more question, what should be the contents of value of StreamData? I'm looking at the codebase but its still not clear to me, should I perhaps return the things returned by Ollama other than the message content as a Value?

EDIT: Below is the stream function in its final form right now:

async fn stream(
        &self,
        messages: &[Message],
    ) -> Result<Pin<Box<dyn Stream<Item = Result<StreamData, LLMError>> + Send>>, LLMError> {
        let request = self.generate_request(messages);
        let result = self.client.send_chat_messages_stream(request).await?;

        let stream = result.map(|data| match data {
            Ok(data) => match data.message {
                Some(message) => Ok(StreamData::new(
                    serde_json::to_value(message.clone()).unwrap_or_default(),
                    message.content,
                )),
                None => Err(LLMError::ContentNotFound(
                    "No message in response".to_string(),
                )),
            },
            Err(_) => Err(OllamaError::from("Stream error".to_string()).into()),
        });

        Ok(Box::pin(stream))
    }

@erhant erhant marked this pull request as ready for review May 22, 2024 12:44
@prabirshrestha
Copy link
Collaborator

Value is the raw json response from the server in case the user wants to access other properties from json. Most folks will only care about the content.

I'm wondering if we Item should be Result<Option<StreamData>, LLMError>> instead of Result<StreamData, LLMError>. I had mentioned my concerns #140. Else based on the LLM provider we need to have a different error. Other option is to have the same error for all LLMs but since we are ending the call anyway I think Option is lot easier to work with.

@erhant
Copy link
Contributor Author

erhant commented May 22, 2024

EDITED*

Yea I think returning Option instead would be better DX instead of error handling. I guess we keep this PR as is w.r.t to the returned item, and tackle that issue in a separate PR?

I will update the code to return the entire response as Value 👍🏻

@prabirshrestha
Copy link
Collaborator

I'm ok changing to Option in a different PR.

@erhant
Copy link
Contributor Author

erhant commented May 23, 2024

Updated, also added a TODO note for that Option.

@erhant
Copy link
Contributor Author

erhant commented May 24, 2024

I have no idea why the build is failing btw :o

@Abraxas-365
Copy link
Owner

I have no idea why the build is failing btw :o

Don't know why this happens, I have to delete the actions cache once in a while and it works again

@@ -3,3 +3,6 @@ pub use openai::*;

pub mod claude;
pub use claude::*;

pub mod ollama;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these just be mod ollama; instead of pub mod ollama;.

Since this already exists probably can be fixed separately.

Copy link
Collaborator

@prabirshrestha prabirshrestha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added nit comments, though most of these seems to have already existed so I'm ok fixing this separately. @Abraxas-365 feels free to review and merge if it looks good, else I will merge it tomorrow.

impl Default for OllamaConfig {
fn default() -> Self {
Self {
api_key: Secret::new("ollama".to_string()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should secret here also be a constant similar to OLLAMA_API_BASE

.with_api_key("ollama"),
)
.with_model("llama2");
let ollama = Ollama::default().with_model("llama3");

let response = ollama.invoke("hola").await.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth saying Hello instead of hola.

@prabirshrestha prabirshrestha merged commit a0a8078 into Abraxas-365:main May 25, 2024
2 checks passed
@prabirshrestha
Copy link
Collaborator

Merged. Thanks!

@erhant
Copy link
Contributor Author

erhant commented May 25, 2024

Thanks for the merge! Just saw the messages, was on road today.

I plan on handling the function call PR when Ollama-rs is updated as well, and maybe we may have to change a few lines on token count calculation implemented in this PR for Ollama, nothing big though 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pull missing model for Ollama LLM generation
3 participants