Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_get_logits_ith: invalid logits id -1, reason: no logits #1855

Open
4 tasks done
devashishraj opened this issue Dec 7, 2024 · 0 comments
Open
4 tasks done

llama_get_logits_ith: invalid logits id -1, reason: no logits #1855

devashishraj opened this issue Dec 7, 2024 · 0 comments

Comments

@devashishraj
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Get a response from model

Current Behavior

llama_get_logits_ith: invalid logits id -1, reason: no logits
[1] 58786 segmentation fault python ragPhiGguf.py --model_path Phi-3.5-mini-instruct-Q4_K_M.gguf --query
/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/multiprocessing/resource_tracker.py:276: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/loky-58786-dm8v3d6t'}
warnings.warn(

Environment and Context

  • Physical (or virtual) hardware you are using, e.g. for Linux:
    arm64 m2 pro

  • Operating System, e.g. for Linux:
    mac

  • SDK version, e.g. for Linux:
    Python 3.13.0

GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

g++ --version
Apple clang version 16.0.0 (clang-1600.0.26.4)
Target: arm64-apple-darwin24.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Failure Information (for bugs)

segmentation fault

Steps to Reproduce

i was trying to build a RAG script using Phi-3.5-mini-instruct-Q4_K_M.gguf and i am using default chat template

Available chat formats from metadata: chat_template.default
Using gguf chat template: {% for message in messages %}{% if message['role'] == 'system' and message['content'] %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
def generate_response(
        self, query: str, context_docs: List[Dict], max_tokens: int = 1000
    ) -> str:
        """
        Generate response using retrieved documents as context, formatted in GGUF chat template.
        Args:
            query (str): User query
            context_docs (List[Dict]): Retrieved context documents
            max_tokens (int): Maximum tokens to generate
        Returns:
            Generated response
        """
        # Construct context from documents
        context_texts = []
        for doc in context_docs:
            context_texts.append(
                f"Article ID: {doc.get('article_id', 'N/A')
                               }\n{doc.get('text', '')}"
            )
        context = "\n\n".join(context_texts)
        # Ensure context fits within the context window
        context = self.truncate_text(context)

        # Construct GGUF chat template prompt
        prompt = (
            "{% for message in messages %}"
            "{% if message['role'] == 'system' and message['content'] %}"
            "{{'<|system|>\n' + message['content'] + '<|end|>\n'}}"
            "{% elif message['role'] == 'user' %}"
            "{{'<|user|>\n' + message['content'] + '<|end|>\n'}}"
            "{% elif message['role'] == 'assistant' %}"
            "{{'<|assistant|>\n' + message['content'] + '<|end|>\n'}}"
            "{% endif %}{% endfor %}"
            "{% if add_generation_prompt %}"
            "{{ '<|assistant|>\n' }}"
            "{% else %}{{ eos_token }}{% endif %}"
        )

        template = Template(prompt)

        # Replace placeholders with the actual chat history
        messages = [
            {"role": "system", "content": f"Context:\n{context}"},
            {"role": "user", "content": query},
        ]
        formatted_prompt = template.render(
            messages=messages, add_generation_prompt=True, eos_token="<|endoftext|>"
        )

        # Generate response
        try:
            response = self.llm(
                formatted_prompt,
                max_tokens=max_tokens,
                stop=["<|end|>"],
                echo=False,
            )
            return response["choices"][0]["text"].strip()
        except Exception as e:
            self.logger.error(f"Response generation error: {e}")
            return f"Error generating response: {str(e)}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant