Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama bug and improvements #686

Open
easydatawarehousing opened this issue Jul 1, 2024 · 3 comments
Open

Ollama bug and improvements #686

easydatawarehousing opened this issue Jul 1, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@easydatawarehousing
Copy link

Temperature and seed parameters should be part of 'options'

According to the docs temperature and seed should be passed as options:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "options": {
    "seed": 101,
    "temperature": 0
  }
}'

In the current implementation these are passed at the same level as parameters like 'model'.

Changing code of Langchain::LLM::Ollama like this works, but is probably not the best place to implement this.

def chat(messages:, model: nil, **params, &block)
  parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))

  if parameters.key?(:seed) || parameters.key?(:temperature)
    parameters[:options] = {}

    if parameters.key?(:seed)
      parameters[:options][:seed] = parameters.delete(:seed)
    end

    if parameters.key?(:temperature)
      parameters[:options][:temperature] = parameters.delete(:temperature)
    end
  end

  # ...

Non-streaming response chunks should be joined before parsing?

I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to chat method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.

In the current implementation each chunk is JSON.parse'd seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.

Changing code of Langchain::LLM::Ollama like this works for me.

def chat(messages:, model: nil, **params, &block)
  parameters = chat_parameters.to_params(params.merge(messages:, model:, stream: block.present?))
  responses_stream = []

  if parameters[:stream]
    # Existing code
    client.post("api/chat", parameters) do |req|
      req.options.on_data = json_responses_chunk_handler do |parsed_chunk|
        responses_stream << parsed_chunk

        block&.call(OllamaResponse.new(parsed_chunk, model: parameters[:model]))
      end
    end

    generate_final_chat_completion_response(responses_stream, parameters)
    # /Existing code
  else
    client.post("api/chat", parameters) do |req|
      req.options.on_data = proc do |chunk, _size, _env|
        puts "RECEIVED #{_size} CHARS, LAST CHAR IS: '#{chunk[-1]}'" # DEBUG
        responses_stream << chunk
      end
    end

    OllamaResponse.new(
      {
        "message" => {
          "role"    => "assistant",
          "content" => JSON.parse(responses_stream.join).dig("message", "content")
        }
      },
      model: parameters[:model]
    )
  end
end

Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature.
This happens at least with llama3-8b-q8 and phi3-14b-q5 models.
Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.

Inherit from Langchain::LLM::OpenAI ?

Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.

@easydatawarehousing easydatawarehousing added the enhancement New feature or request label Jul 1, 2024
@Fodoj
Copy link
Contributor

Fodoj commented Aug 29, 2024

I confirm the bug with chunks, once too big of an input is sent, I get ``parse': 451: unexpected token at '' (JSON::ParserError)`, simply because the chunk ends in a way that the line is not a valid json.

@siberas
Copy link

siberas commented Jan 16, 2025

We are encountering the same error and would like to suggest a possible improvement. Would it be feasible to handle the response in a single step to avoid this issue?

Additionally, we believe this problem warrants being classified as a bug for better visibility and prioritization.

@easydatawarehousing easydatawarehousing changed the title Ollama improvements Ollama bug and improvements Jan 17, 2025
@easydatawarehousing
Copy link
Author

At the time of posting this I was not sure the chunked responses was a bug, could have been just my version of ollama. Since more people are encountering this it probably is a bug in this gem. I have changed the title of this issue.

My way of 'fixing' this bug was to drop use of this gem altogether ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants