-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648
Labels
bug
Something isn't working
Comments
Hi,
thanks for checking.
When I recreate your test, it is working.
The problem seems to be when using JSON schema with 0.2.84/5. Working with 0.2.83.
Please find attached Jupyter NB. It locks on Mac.
Kind regards
Lukáš

On 3. 8. 2024, at 11:52, Shamit Verma ***@***.***> wrote:
Can't recreate this issue with :
M1 , Python 3.12 , 0.2.85 , Phi-3 and Llama 3.1
image.png (view on web) <https://github.com/user-attachments/assets/5ae9ca01-5bd9-4881-912b-5c75fe1aa6c8>
image.png (view on web) <https://github.com/user-attachments/assets/9fd7792c-bf66-4a5d-80f4-64e77fab2c57>
—
Reply to this email directly, view it on GitHub <#1648 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANXURXD7PDP5ZFEIF2PB65TZPSR5HAVCNFSM6AAAAABL3GCO2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRWGY2TQMBWGM>.
You are receiving this because you authored the thread.
|
Don't see the attachment somehow. |
No problem, here it is:
from llama_cpp import Llama
from llama_cpp.llama_speculative import LlamaPromptLookupDecoding
model_path = "/Users/macmacmac/Documents/CODING/models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf"
model = Llama(
model_path=str(model_path),
draft_model=LlamaPromptLookupDecoding(
num_pred_tokens=13,
max_ngram_size=9
),
n_ctx=8192,
n_batch=128,
last_n_tokens_size=128,
n_gpu_layers=-1,
f16_kv=True,
offload_kqv=True,
flash_attn=True,
n_threads=2,
n_threads_batch=2,
chat_format="chatml"
)
schema = """\
{
"type": "object",
"properties":
{
"response":
{
"type": "string"
}
},
"required": ["response"]
}"""
completion = model.create_chat_completion(
messages=[
{"role": "system", "content": "You are an assistant."},
{
"role": "user",
"content": "What is most popular street food in Paris? Answer in JSON. Put your answer in 'response' property. Use schema: {'response': '...'}"
}
],
max_tokens=-1,
temperature=0.25,
top_k=25,
top_p=0.8,
min_p=0.025,
typical_p=0.8,
tfs_z=0.6,
mirostat_mode=2,
mirostat_tau=2.2,
mirostat_eta=0.025,
response_format = {
"type": "json_object",
"schema": schema}
)
print(completion)
On 5. 8. 2024, at 12:56, Shamit Verma ***@***.***> wrote:
Don't see the attachment somehow.
—
Reply to this email directly, view it on GitHub <#1648 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANXURXD2PRUQWP4NBZWNMGLZP5K7DAVCNFSM6AAAAABL3GCO2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYG44DQNZZHA>.
You are receiving this because you authored the thread.
|
Yup - willing to bet this is fixed in #1649 - there's a whole cluster of issues that will get cleared with this change. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Prerequisites
Version 0.2.84 or 0.2.85 and using create_chat_completion method.
Tried different GGUF models.
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Provide a result as described in the documentation.
Current Behavior
Inference is stuck (I let it run for 5 minutes).
After downgrading to version 0.2.83 everything runs without a single change in the code.
Environment and Context
Mac M1 MAX, 32GB RAM, MacOS 14.5, Python 3.12, llama-cpp-python 0.2.84/5.
The text was updated successfully, but these errors were encountered: