-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: 'token_count' when calling chat API with RAG toolgroup #887
Comments
Hi! This is caused by the persistent DB saved by the old llama-stack, which did not have any metadata. Can you delete the previous DB in |
Hi @wukaixingxp , I have tried using a fresh volume mount, but it is still not working. I'm getting the same error. |
I have removed the volume mount option while running the container to verify the issue. Here is the updated code according to your documentation. # %% [markdown]
# **Setting up Vector DBs**
# %%
import os
import uuid
from llama_stack_client import LlamaStackClient
from llama_stack_client.types.agent_create_params import AgentConfig
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.types import UserMessage
from llama_stack_client.lib.agents.event_logger import EventLogger
# %%
host = "localhost"
port = 6003
model_name="meta-llama/Llama-3.2-1B-Instruct"
# %%
client = LlamaStackClient(base_url=f"http://{host}:{port}")
# %%
# Register a vector db
vector_db_id = "my_documents"
response = client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model="all-MiniLM-L6-v2",
embedding_dimension=384,
provider_id="faiss",
)
# %%
# You can insert a pre-chunked document directly into the vector db
chunks = [
{
"document_id": "hrms.md",
"content": "My Content...",
"mime_type": "text/plain"
},
]
client.vector_io.insert(vector_db_id=vector_db_id, chunks=chunks)
# You can then query for these chunks
chunks_response = client.vector_io.query(vector_db_id=vector_db_id, query="what is office hours?")
print(chunks_response)
# %%
# Configure agent with memory
agent_config = AgentConfig(
model=model_name,
enable_session_persistence=True,
instructions="You are a helpful assistant",
toolgroups=[
{
"name": "builtin::rag",
"args": {
"vector_db_ids": [vector_db_id],
}
}
]
)
# %%
agent = Agent(client, agent_config)
session_id = agent.create_session("rag_session")
# %%
# Query with RAG
response = agent.create_turn(
messages=[{
"role": "user",
"content": "What are the key topics in the documents?"
}],
session_id=session_id
)
for log in EventLogger().log(response):
log.print()
Error Log2025-01-29 12:10:07 INFO: 172.17.0.1:47034 - "POST /v1/agents/ff9135fc-fa75-4e77-8674-f7aeb97e2203/session/8fd6d687-fb34-44f1-b8fa-98e612cfbd30/turn HTTP/1.1" 200 OK
2025-01-29 12:10:07 06:40:07.453 [START] /v1/agents/ff9135fc-fa75-4e77-8674-f7aeb97e2203/session/8fd6d687-fb34-44f1-b8fa-98e612cfbd30/turn
2025-01-29 12:10:07 06:40:07.481 [START] create_and_execute_turn
2025-01-29 12:10:07 06:40:07.498 [START] query_from_memory
2025-01-29 12:10:07
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 46.23it/s]
2025-01-29 12:10:07 Traceback (most recent call last):
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 157, in sse_generator
2025-01-29 12:10:07 async for item in event_gen:
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agents.py", line 168, in _create_agent_turn_streaming
2025-01-29 12:10:07 async for event in agent.create_and_execute_turn(request):
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 192, in create_and_execute_turn
2025-01-29 12:10:07 async for chunk in self.run(
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 266, in run
2025-01-29 12:10:07 async for res in self._run(
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 439, in _run
2025-01-29 12:10:07 result = await self.tool_runtime_api.rag_tool.query(
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
2025-01-29 12:10:07 result = await method(self, *args, **kwargs)
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 423, in query
2025-01-29 12:10:07 return await self.routing_table.get_provider_impl(
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
2025-01-29 12:10:07 result = await method(self, *args, **kwargs)
2025-01-29 12:10:07 File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/tool_runtime/rag/memory.py", line 131, in query
2025-01-29 12:10:07 tokens += metadata["token_count"]
2025-01-29 12:10:07 KeyError: 'token_count'
2025-01-29 12:10:07 06:40:07.626 [END] query_from_memory [StatusCode.OK] (128.13ms)
2025-01-29 12:10:07 06:40:07.638 [END] create_and_execute_turn [StatusCode.OK] (156.16ms)
2025-01-29 12:10:07 06:40:07.648 [END] /v1/agents/ff9135fc-fa75-4e77-8674-f7aeb97e2203/session/8fd6d687-fb34-44f1-b8fa-98e612cfbd30/turn [StatusCode.OK] (195.09ms) |
Hi! I think this bug is because your are manually creating the
|
Thank you, @wukaixingxp 👍 |
I had to parse document using |
@abhishek-syno @wukaixingxp I almost always seem to get "none" and wrong responseswhen using agentConfig RAG
Any idea what I could be doing wrong? |
@zanetworker , |
yes! Logs from ollama:
|
@zanetworker , check in your ollama logs you must be facing input prompt warning. here is your solution. |
Thanks @abhishek-syno thats a good lead to go on. Will give it a shot :) |
@wukaixingxp @ashwinb , I wanted to bring to your attention that a Beginning of Sequence (BOS) token has been added to the prompt as specified by the model. However, the prompt also begins with another BOS token. As a result, the final prompt now starts with 2 BOS tokens. Are you sure this is the intended outcome? This issue should be addressed through the Llama-stack, as no BOS token type prompt is required in this case. |
raised a new issue here: #913 |
System Info
Description
Encountered a KeyError when trying to use the chat API with RAG toolgroup. The error occurs when the system tries to access the 'token_count' key in metadata.
Python version: 3.13.1
llama_stack: 0.1
llama_stack_client: 0.1
Information
🐛 Describe the bug
Steps to Reproduce
Created an agent with RAG toolgroup
Initialized a session
Attempted to make a chat request
Error logs
Error Message -Docker Ollama Distribution
Expected behavior
Through the client SDK code/ API call using POSTMAN it is throwing the error:
500: Internal server error: An unexpected error occurred.
It should work.
The text was updated successfully, but these errors were encountered: