You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Non-streaming response chunks should be joined before parsing?
I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to chat method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.
In the current implementation each chunk is JSON.parse'd seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.
Changing code of Langchain::LLM::Ollama like this works for me.
Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature.
This happens at least with llama3-8b-q8 and phi3-14b-q5 models.
Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.
Inherit from Langchain::LLM::OpenAI ?
Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.
The text was updated successfully, but these errors were encountered:
I confirm the bug with chunks, once too big of an input is sent, I get ``parse': 451: unexpected token at '' (JSON::ParserError)`, simply because the chunk ends in a way that the line is not a valid json.
We are encountering the same error and would like to suggest a possible improvement. Would it be feasible to handle the response in a single step to avoid this issue?
Additionally, we believe this problem warrants being classified as a bug for better visibility and prioritization.
At the time of posting this I was not sure the chunked responses was a bug, could have been just my version of ollama. Since more people are encountering this it probably is a bug in this gem. I have changed the title of this issue.
My way of 'fixing' this bug was to drop use of this gem altogether ...
Temperature and seed parameters should be part of 'options'
According to the docs temperature and seed should be passed as options:
In the current implementation these are passed at the same level as parameters like 'model'.
Changing code of Langchain::LLM::Ollama like this works, but is probably not the best place to implement this.
Non-streaming response chunks should be joined before parsing?
I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to
chat
method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.In the current implementation each chunk is
JSON.parse
'd seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.Changing code of Langchain::LLM::Ollama like this works for me.
Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature.
This happens at least with llama3-8b-q8 and phi3-14b-q5 models.
Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.
Inherit from Langchain::LLM::OpenAI ?
Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.
The text was updated successfully, but these errors were encountered: