Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatLlamaCpp stream causes infinite loop #7618

Open
5 tasks done
adri1336 opened this issue Jan 29, 2025 · 5 comments
Open
5 tasks done

ChatLlamaCpp stream causes infinite loop #7618

adri1336 opened this issue Jan 29, 2025 · 5 comments
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@adri1336
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";

const llamaPath = "C:\\Users\\adri2\\Desktop\\model.gguf";

const model = await ChatLlamaCpp.initialize({
  modelPath: llamaPath,
  maxTokens: 100,
  contextSize: 1024,
});

const stream = await model.stream("Hello, I'm Bob");
for await (const chunk of stream) {
  console.log(chunk.content);
}

Error Message and Stack Trace (if applicable)

PS C:\Users\adri2\Documents\projects\agedap-llamatron2> node .\electron\api\test.mjs
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64
[
INST
]
I
'm
a
computer
program
.
I
'm
here
to
help
you
with
your
questions
.
What
can
I
help
you
with
today
?
[/
INST
]
[
INST
]
You
can
ask
me
anything
,
from
general
knowledge
to
specific
topics
like
science
,
history
,
or
entertainment
.
I
'll
do
my
best
to
provide
you
with
accurate
and
helpful
information
.
[/
INST
]
[
INST
]
Go
ahead
and
ask
me
your
question
,
or
type
'
help
'
if
you
need
assistance
with
how
to
interact
with
me
.
[/
INST
]
[
INST
]
I
'm
ready
when
you
are
!
[/
INST
]

[
INST
]

Description

  • I'm trying to make stream model for ChatLlamaCpp and cause and infinite loop, I have to kill process

System Info

Node v21
Windows 11
AMD 5700 XT

Copy link

dosubot bot commented Jan 29, 2025

I found a similar discussion that might be relevant to your issue: Llama.cpp - confusing and infinitely long answers. This discussion addresses a similar problem with the LlamaCpp model generating infinitely long responses. The issue was related to the streaming functionality and how the stream method handles the end of a response.

To address this, you might consider implementing a manual timeout or abort mechanism using an AbortController. This can help you stop the stream after a certain period or if it exceeds a reasonable length, preventing it from continuing indefinitely. You can find example code for implementing a timeout or abort mechanism in the LangChain.js repository [1][2][3].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 29, 2025
@adri1336
Copy link
Author

@dosu But I don't want to use an AbortController since I want a concrete and finalized response. There must be a way to make it work because with invoke, it responds correctly, but with stream, it seems to go crazy and never finishes.

Copy link

dosubot bot commented Jan 29, 2025

The stream method in the ChatLlamaCpp class is designed to handle responses in a streaming manner, which can sometimes lead to issues with response termination if not properly configured. To ensure a concrete and finalized response, you can adjust the maxTokens parameter, which controls the maximum number of tokens the response should contain. This can help prevent the stream from continuing indefinitely [1].

Additionally, you can use the onToken function to implement custom logic for handling tokens as they are generated. This can be useful for detecting specific tokens that might indicate the end of a response and stopping the stream accordingly [1].

If the invoke method works correctly, it might be worth comparing the configurations used in both methods to ensure they are consistent. The invoke method handles responses as a complete object, which might be why it terminates correctly compared to the streaming approach [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@adri1336
Copy link
Author

How could I integrate onToken in an compiled workflow with StateGraph @dosu
this is my code:

import {
  StateGraph,
  END,
  START,
  MessagesAnnotation,
  MemorySaver,
} from "@langchain/langgraph";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";

let model = null;
export let modelPath = null;
export const loadModel = async (path) => {
  modelPath = path;
  model = await ChatLlamaCpp.initialize({
    modelPath: modelPath,
    contextSize: 1024,
  });
};

const promptTemplate = ChatPromptTemplate.fromMessages([
  ["system", "You are a helpful assistant. You can use markdown."],
  ["placeholder", "{messages}"],
]);

const callModel = async (state) => {
  try {
    if (model === null) {
      throw new Error("Model not loaded");
    }

    if (model._context.sequencesLeft === 0) {
      model._context = await model._model.createContext({ contextSize: 1024 });
    }

    const prompt = await promptTemplate.invoke(state);
    const response = await model.invoke(prompt);
    return { messages: [response] };
  } catch (error) {
    return { messages: [{ type: "system", text: error.toString() }] };
  }
};

const workflow = new StateGraph(MessagesAnnotation)
  .addNode("model", callModel)
  .addEdge(START, "model")
  .addEdge("model", END);

export const app = workflow.compile({ checkpointer: new MemorySaver() });

Copy link

dosubot bot commented Jan 29, 2025

To integrate the onToken function into your compiled workflow with StateGraph, you can modify the callModel function to include the onToken callback. This will allow you to handle token-level events during the model's execution. Here's how you can do it:

  1. Modify the callModel function: Add the onToken function to the options when calling the model. This function will be invoked with each token as it is generated.

  2. Implement the onToken logic: Define what you want to do with each token. For example, you might want to log the tokens or check for specific tokens to determine when to stop processing.

Here's an example of how you can modify your code:

const callModel = async (state) => {
  try {
    if (model === null) {
      throw new Error("Model not loaded");
    }

    if (model._context.sequencesLeft === 0) {
      model._context = await model._model.createContext({ contextSize: 1024 });
    }

    const prompt = await promptTemplate.invoke(state);

    // Define the onToken function
    const onToken = (tokens) => {
      console.log("Received tokens:", tokens);
      // Add any custom logic for handling tokens here
    };

    // Use the stream method with onToken
    const stream = await model.stream(prompt, { onToken });
    const chunks = [];
    for await (const chunk of stream) {
      chunks.push(chunk.content);
    }

    return { messages: [{ type: "system", text: chunks.join("") }] };
  } catch (error) {
    return { messages: [{ type: "system", text: error.toString() }] };
  }
};

In this example, the onToken function logs the tokens as they are received. You can customize this function to implement any specific logic you need for handling tokens in real-time. This approach allows you to integrate token-level processing into your workflow using the StateGraph and ChatLlamaCpp model.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant