The latest version kills python kernel with LlamaGrammar #1623

yamikumo-DSD · 2024-07-25T13:21:09Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior and Current Behavior

The latest version of llama-cpp-python kills python kernel with LlamaGrammar.
I ran the following code:

from llama_cpp import Llama, LlamaGrammar
model = Llama(model_path="ggufs/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf", verbose=False) # Model doesn't matter.
grammar = LlamaGrammar.from_string('root ::= "a"+')
model("hello", max_tokens=10, grammar=grammar)

When it ran, the python kernel died immediately for unknown reason. Dying kernel doesn't happen without use of LlamaGrammar.
Because this behavior has not been observed recently (actually few days ago), I suspect my recent update of llama-cpp-python module made this problem.

What I tried is:

Build the latest code by myself -> kernel died
Build the code in the latest release by myself -> kernel died
Re-install wheel of the latest release -> No problem

My experiment might suggest this problem comes from backend Llama.cpp and is not llama-cpp-python's fault.
But anyway, I wanna know whether people is experiencing this bug.

Environment

OS: macOS Sonoma
Processor: M2Max 64GB.
Python version: 11

The text was updated successfully, but these errors were encountered:

handshape · 2024-07-29T01:41:14Z

I'm experiencing something similar on x86_64 with CUDA since 0.2.84 - segfault when calling LlamaGrammar through the JSON-schema binding. No trouble without the grammar constraint. Rolling back to 0.2.83 fixes it.

yamikumo-DSD · 2024-07-29T02:07:52Z

I'm experiencing something similar on x86_64 with CUDA since 0.2.84 - segfault when calling LlamaGrammar through the JSON-schema binding. No trouble without the grammar constraint. Rolling back to 0.2.83 fixes it.

Which do you think should we blame for it, llama-cpp-python or native Llama.cpp?
I observe no issue arisen on this problem in Llama.cpp's issue page regardless of increasing demand on function calling, which I see weird. (so, environment dependent thing? but we both using CUDA/MPS have the problem.)
The version of these module really matters as new models popping up every week, and I don't want to stick to the old versions.

axel7083 · 2024-07-29T10:52:13Z

I ran the following code:

I do reproduce the same error. After entering the code you provided the program stop.

The scenario where the problem occurs can be found in detailed in #1636 (and the hardware/software specification).

Rolling back to 0.2.82 seems to fix the problem: the code provided execute without crash/issue on previous version.

yamikumo-DSD · 2024-07-29T11:08:26Z

I ran the following code:

I do reproduce the same error. After entering the code you provided the program stop.

The scenario where the problem occurs can be found in detailed in #1636 (and the hardware/software specification).

Rolling back to 0.2.82 seems to fix the problem: the code provided execute without crash/issue on previous version.

I found recent Llama.cpp did split the main source code (llama.cpp) into several part;

llama-grammar.cpp
llama-sampling.cpp
llama-vocab.cpp

These source files found in the latest release cannot be found in releases a week ago.
though I'm not familiar with binding method between C++ and Python, i suspect some internal changes in Llama.cpp causes mismatch of communication between C++-implemented binaries and Python.

handshape · 2024-08-01T04:43:10Z

Looks like work is underway: #1637

ProfessorAO · 2024-08-06T11:31:39Z

Anyone find a solution for this ?

ProfessorAO · 2024-08-06T11:32:31Z

Getting the same error, both in colab and in personal. On vs code it comes up as an OS Error or Win Error.

handshape · 2024-08-06T12:49:44Z

Getting the same error, both in colab and in personal. On vs code it comes up as an OS Error or Win Error.

Pretty sure this PR addresses it, pending the next release: #1649

yamikumo-DSD mentioned this issue Jul 29, 2024

segmentation fault 0.2.84 when using function calling #1636

Closed

4 tasks

axel7083 mentioned this issue Jul 29, 2024

chore(deps): update auto merged updates containers/ai-lab-recipes#703

Merged

1 task

abetlen added the bug Something isn't working label Aug 1, 2024

tc-wolf mentioned this issue Aug 1, 2024

Fix crash when using grammar #1649

Merged

abetlen closed this as completed in #1649 Aug 4, 2024

kunesj mentioned this issue Aug 15, 2024

Invoking any grammar except None causes webui to crash oobabooga/text-generation-webui#6329

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The latest version kills python kernel with LlamaGrammar #1623

The latest version kills python kernel with LlamaGrammar #1623

yamikumo-DSD commented Jul 25, 2024 •

edited

Loading

handshape commented Jul 29, 2024

yamikumo-DSD commented Jul 29, 2024 •

edited

Loading

axel7083 commented Jul 29, 2024

yamikumo-DSD commented Jul 29, 2024

handshape commented Aug 1, 2024

ProfessorAO commented Aug 6, 2024

ProfessorAO commented Aug 6, 2024

handshape commented Aug 6, 2024

The latest version kills python kernel with LlamaGrammar #1623

The latest version kills python kernel with LlamaGrammar #1623

Comments

yamikumo-DSD commented Jul 25, 2024 • edited Loading

Prerequisites

Expected Behavior and Current Behavior

Environment

handshape commented Jul 29, 2024

yamikumo-DSD commented Jul 29, 2024 • edited Loading

axel7083 commented Jul 29, 2024

yamikumo-DSD commented Jul 29, 2024

handshape commented Aug 1, 2024

ProfessorAO commented Aug 6, 2024

ProfessorAO commented Aug 6, 2024

handshape commented Aug 6, 2024

yamikumo-DSD commented Jul 25, 2024 •

edited

Loading

yamikumo-DSD commented Jul 29, 2024 •

edited

Loading