llama_get_logits_ith: invalid logits id -1, reason: no logits #1855

devashishraj · 2024-12-07T19:09:16Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Get a response from model

Current Behavior

llama_get_logits_ith: invalid logits id -1, reason: no logits
[1] 58786 segmentation fault python ragPhiGguf.py --model_path Phi-3.5-mini-instruct-Q4_K_M.gguf --query
/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/multiprocessing/resource_tracker.py:276: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/loky-58786-dm8v3d6t'}
warnings.warn(

Environment and Context

Physical (or virtual) hardware you are using, e.g. for Linux:
arm64 m2 pro
Operating System, e.g. for Linux:
mac
SDK version, e.g. for Linux:
Python 3.13.0

GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

g++ --version
Apple clang version 16.0.0 (clang-1600.0.26.4)
Target: arm64-apple-darwin24.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Failure Information (for bugs)

segmentation fault

Steps to Reproduce

i was trying to build a RAG script using Phi-3.5-mini-instruct-Q4_K_M.gguf and i am using default chat template

Available chat formats from metadata: chat_template.default
Using gguf chat template: {% for message in messages %}{% if message['role'] == 'system' and message['content'] %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}

def generate_response(
        self, query: str, context_docs: List[Dict], max_tokens: int = 1000
    ) -> str:
        """
        Generate response using retrieved documents as context, formatted in GGUF chat template.
        Args:
            query (str): User query
            context_docs (List[Dict]): Retrieved context documents
            max_tokens (int): Maximum tokens to generate
        Returns:
            Generated response
        """
        # Construct context from documents
        context_texts = []
        for doc in context_docs:
            context_texts.append(
                f"Article ID: {doc.get('article_id', 'N/A')
                               }\n{doc.get('text', '')}"
            )
        context = "\n\n".join(context_texts)
        # Ensure context fits within the context window
        context = self.truncate_text(context)

        # Construct GGUF chat template prompt
        prompt = (
            "{% for message in messages %}"
            "{% if message['role'] == 'system' and message['content'] %}"
            "{{'<|system|>\n' + message['content'] + '<|end|>\n'}}"
            "{% elif message['role'] == 'user' %}"
            "{{'<|user|>\n' + message['content'] + '<|end|>\n'}}"
            "{% elif message['role'] == 'assistant' %}"
            "{{'<|assistant|>\n' + message['content'] + '<|end|>\n'}}"
            "{% endif %}{% endfor %}"
            "{% if add_generation_prompt %}"
            "{{ '<|assistant|>\n' }}"
            "{% else %}{{ eos_token }}{% endif %}"
        )

        template = Template(prompt)

        # Replace placeholders with the actual chat history
        messages = [
            {"role": "system", "content": f"Context:\n{context}"},
            {"role": "user", "content": query},
        ]
        formatted_prompt = template.render(
            messages=messages, add_generation_prompt=True, eos_token="<|endoftext|>"
        )

        # Generate response
        try:
            response = self.llm(
                formatted_prompt,
                max_tokens=max_tokens,
                stop=["<|end|>"],
                echo=False,
            )
            return response["choices"][0]["text"].strip()
        except Exception as e:
            self.logger.error(f"Response generation error: {e}")
            return f"Error generating response: {str(e)}"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_get_logits_ith: invalid logits id -1, reason: no logits #1855

llama_get_logits_ith: invalid logits id -1, reason: no logits #1855

devashishraj commented Dec 7, 2024

llama_get_logits_ith: invalid logits id -1, reason: no logits #1855

llama_get_logits_ith: invalid logits id -1, reason: no logits #1855

Comments

devashishraj commented Dec 7, 2024

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce