Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_get_logits_ith: invalid logits id -1, reason: no logits #1812

Open
ba0gu0 opened this issue Oct 28, 2024 · 2 comments
Open

llama_get_logits_ith: invalid logits id -1, reason: no logits #1812

ba0gu0 opened this issue Oct 28, 2024 · 2 comments

Comments

@ba0gu0
Copy link

ba0gu0 commented Oct 28, 2024

llama_get_logits_ith: invalid logits id -1 error when embedding=True

Expected Behavior

When using llama-cpp-python with Qwen2 model, the chat completion should work normally regardless of whether the embedding parameter is enabled or not.

Current Behavior

The model works fine when embedding=False, but throws an error llama_get_logits_ith: invalid logits id -1, reason: no logits when embedding=True.

Working Code Example

from llama_cpp import Llama

# This works fine
llm = Llama(
    model_path="./models/qwen2-0_5b-instruct-q8_0.gguf", 
    chat_format="chatml", 
    verbose=False
)

messages = [
    {"role": "system", "content": "Summarize this text for me: You are an assistant who creates short stories."},
    {"role": "user", "content": "Long ago, in a peaceful village, a little girl named Leah loved watching the stars at night..."}
]

response = llm.create_chat_completion(messages=messages)

'''
{'id': 'chatcmpl-17ca45ef-d13b-425a-96be-7631e3b9a7f4',
 'object': 'chat.completion',
 'created': 1730125699,
 'model': './models/qwen2-0_5b-instruct-q8_0.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': 'This text is a short story about a little girl named Leah who loves watching the stars at night. One day, she noticed a particularly bright star that seemed to wink at her, and she made a wish to become friends with the star. This star spirit helped Leah take her on a magical adventure among the stars, and she visited countless constellations and stardust rivers.'},
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 145, 'completion_tokens': 76, 'total_tokens': 221}
}
'''

# Works successfully

Error Reproduction

from llama_cpp import Llama

# This causes an error
llm = Llama(
    model_path="./models/qwen2-0_5b-instruct-q8_0.gguf", 
    chat_format="chatml", 
    verbose=False, 
    embedding=True  # Only difference is enabling embedding
)

messages = [
    {"role": "system", "content": "Summarize this text for me: You are an assistant who creates short stories."},
    {"role": "user", "content": "Long ago, in a peaceful village, a little girl named Leah loved watching the stars at night..."}
]

llm.create_chat_completion(messages=messages)
# Error: llama_get_logits_ith: invalid logits id -1, reason: no logits

embeddings = llm.create_embedding("Hello, world!")
# Here is normal

'''
{'object': 'list',
 'data': [{'object': 'embedding',
   'embedding': [[0.9160200953483582,
     5.090432167053223,
     1.487088680267334, ......
'''

Environment Info

  • Python version: 3.10
  • llama-cpp-python version: latest
  • Model: Qwen2-0.5B-Chat (GGUF format)

Steps to Reproduce

  1. Install llama-cpp-python
  2. Download Qwen2-0.5B-Chat GGUF model
  3. Run the error reproduction code above with embedding=True

Additional Context

The error only occurs when:

  1. The embedding parameter is set to True
  2. Using the chat completion functionality

The model works fine for chat completion when embedding=False, suggesting this might be related to how the embedding functionality is implemented for this specific model.

@jayendren
Copy link

confirming the same issue llama_get_logits_ith: invalid logits id -1, reason: no logits when using https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF, setting embedding=False works (my default configuration uses True)

Environment Info

Python version: 3.9.16
llama-cpp-python version: 0.3.1
Model: Hermes-3-Llama-3.1-8B (GGUF format)

@aimerib
Copy link

aimerib commented Dec 3, 2024

I was getting this same error with a Qwen2.5-14b finetune and spent a few hours searching for the answer. It became obvious to me that this was a regression in the llama-cpp codebase, and it may have been addressed recently. Not sure if llama-cpp-python has received upstream patches yet or not, but this may be fixed in the future.

ggerganov/llama.cpp#8076 (comment)

For now, I've resorted to using an embedding specific model with SentenceTransformer, but I'd love to ideally use the same model to get embeddings and and generations from the same model to save on memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants