create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648

mobeetle · 2024-08-01T19:27:58Z

Prerequisites

Version 0.2.84 or 0.2.85 and using create_chat_completion method.
Tried different GGUF models.

Please answer the following questions for yourself before submitting an issue.

[ X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[ X ] I carefully followed the README.md.
[ X ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[ X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Provide a result as described in the documentation.

Current Behavior

Inference is stuck (I let it run for 5 minutes).
After downgrading to version 0.2.83 everything runs without a single change in the code.

Environment and Context

Mac M1 MAX, 32GB RAM, MacOS 14.5, Python 3.12, llama-cpp-python 0.2.84/5.

shamitv · 2024-08-03T09:51:58Z

Can't recreate this issue with :

M1 , Python 3.12 , 0.2.85 , Phi-3 and Llama 3.1

mobeetle · 2024-08-04T10:05:34Z

Hi, thanks for checking. When I recreate your test, it is working. The problem seems to be when using JSON schema with 0.2.84/5. Working with 0.2.83. Please find attached Jupyter NB. It locks on Mac. Kind regards Lukáš On 3. 8. 2024, at 11:52, Shamit Verma ***@***.***> wrote: Can't recreate this issue with : M1 , Python 3.12 , 0.2.85 , Phi-3 and Llama 3.1 image.png (view on web) <https://github.com/user-attachments/assets/5ae9ca01-5bd9-4881-912b-5c75fe1aa6c8> image.png (view on web) <https://github.com/user-attachments/assets/9fd7792c-bf66-4a5d-80f4-64e77fab2c57> — Reply to this email directly, view it on GitHub <#1648 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANXURXD7PDP5ZFEIF2PB65TZPSR5HAVCNFSM6AAAAABL3GCO2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRWGY2TQMBWGM>. You are receiving this because you authored the thread.

shamitv · 2024-08-05T10:56:28Z

Don't see the attachment somehow.

mobeetle · 2024-08-05T11:03:57Z

No problem, here it is: from llama_cpp import Llama from llama_cpp.llama_speculative import LlamaPromptLookupDecoding model_path = "/Users/macmacmac/Documents/CODING/models/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf" model = Llama( model_path=str(model_path), draft_model=LlamaPromptLookupDecoding( num_pred_tokens=13, max_ngram_size=9 ), n_ctx=8192, n_batch=128, last_n_tokens_size=128, n_gpu_layers=-1, f16_kv=True, offload_kqv=True, flash_attn=True, n_threads=2, n_threads_batch=2, chat_format="chatml" ) schema = """\ { "type": "object", "properties": { "response": { "type": "string" } }, "required": ["response"] }""" completion = model.create_chat_completion( messages=[ {"role": "system", "content": "You are an assistant."}, { "role": "user", "content": "What is most popular street food in Paris? Answer in JSON. Put your answer in 'response' property. Use schema: {'response': '...'}" } ], max_tokens=-1, temperature=0.25, top_k=25, top_p=0.8, min_p=0.025, typical_p=0.8, tfs_z=0.6, mirostat_mode=2, mirostat_tau=2.2, mirostat_eta=0.025, response_format = { "type": "json_object", "schema": schema} ) print(completion) On 5. 8. 2024, at 12:56, Shamit Verma ***@***.***> wrote: Don't see the attachment somehow. — Reply to this email directly, view it on GitHub <#1648 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANXURXD2PRUQWP4NBZWNMGLZP5K7DAVCNFSM6AAAAABL3GCO2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYG44DQNZZHA>. You are receiving this because you authored the thread.

handshape · 2024-08-06T15:32:05Z

Yup - willing to bet this is fixed in #1649 - there's a whole cluster of issues that will get cleared with this change.

shamitv · 2024-08-11T07:53:20Z

Version 0.2.87 does not have this issue

abetlen added the bug Something isn't working label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648

create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648

mobeetle commented Aug 1, 2024

shamitv commented Aug 3, 2024

mobeetle commented Aug 4, 2024 via email

shamitv commented Aug 5, 2024

mobeetle commented Aug 5, 2024 via email

handshape commented Aug 6, 2024

shamitv commented Aug 11, 2024

create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648

create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648

Comments

mobeetle commented Aug 1, 2024

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

shamitv commented Aug 3, 2024

mobeetle commented Aug 4, 2024 via email

shamitv commented Aug 5, 2024

mobeetle commented Aug 5, 2024 via email

handshape commented Aug 6, 2024

shamitv commented Aug 11, 2024