Cannot return a value from `models.llamacpp` #1281

erlebach · 2024-11-23T14:38:15Z

Describe the issue as clearly as possible:

Consider the following code:

from outlines import models

# No error
models.llamacpp(
      repo_id="M4-ai/TinyMistral-248M-v2-Instruct-GGUF",
      filename="TinyMistral-248M-v2-Instruct.Q4_K_M.gguf",
)

# No error. Output: <outlines.models.llamacpp.LlamaCpp object at 0x1052ebc10>
print("==>", models.llamacpp(
      repo_id="M4-ai/TinyMistral-248M-v2-Instruct-GGUF",
      filename="TinyMistral-248M-v2-Instruct.Q4_K_M.gguf",
))

# Error 'NoneType' object is not callable
model = models.llamacpp(
      repo_id="M4-ai/TinyMistral-248M-v2-Instruct-GGUF",
      filename="TinyMistral-248M-v2-Instruct.Q4_K_M.gguf",
)

An error is produced when assigning the return value of models.llamacpp to the variable model (third case above). The error does not occur when the return value is not assigned or when it is printed. Is this expected behavior? If it is, this behavior is not documented. Furthermore, the test suite (test_integration_llamacpp.py) does not appear to test for this case. How do I proceed?

Suggestion: in the documented examples, could you state the version of Outlines that example is meant to work with? Breaking changes occur on a regular basis, which might prevent examples from being duplicated easily. Thanks.

Steps/code to reproduce the bug:

from outlines import models

# Error 'NoneType' object is not callable
model = models.llamacpp(
      repo_id="M4-ai/TinyMistral-248M-v2-Instruct-GGUF",
      filename="TinyMistral-248M-v2-Instruct.Q4_K_M.gguf",
)

Expected result:

The code should output normally with no output.

Error message:

Traceback (most recent call last):
  File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/llama.py", line 2201, in __del__
  File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/llama.py", line 2198, in close
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 584, in close
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 576, in __exit__
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 561, in __exit__
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 340, in __exit__
  File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/_internals.py", line 69, in close
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 584, in close
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 576, in __exit__
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 561, in __exit__
  File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 449, in _exit_wrapper
  File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/_internals.py", line 63, in free_model
TypeError: 'NoneType' object is not callable

Outlines/Python version information:

Version information

``` 0.1.5 Python 3.10.10 (main, Mar 21 2023, 13:41:05) [Clang 14.0.6 ] ```

Context for the issue:

I cannot duplicate the documented examples, or some of the testing modules.

The text was updated successfully, but these errors were encountered:

erlebach · 2024-11-23T15:51:50Z

Here is a solution to the problem in the form of a PR, which is undocumented. I found this with the help of Sonnet 3.5 . Could somebody please reply to this and explain why the provided examples in Outlines work for you? Thanks.

Title: Fix llama.cpp cleanup and generator usage pattern

Description

This PR addresses two issues:

The cleanup error with llama.cpp models when assigning to variables
The proper way to use generators with parameters in Outlines

Changes

Changed from problematic pattern in ex5.py:

# DON'T DO THIS - Will cause cleanup issues
m = models.llamacpp(
    repo_id="M4-ai/TinyMistral-248M-v2-Instruct-GGUF",
    filename="TinyMistral-248M-v2-Instruct.Q4_K_M.gguf",
)

To proper cleanup pattern in ex6.py:

import os
from outlines import models, generate
from dotenv import load_dotenv
import torch

load_dotenv()
hf_token = os.getenv("HF_API_TOKEN")
if not hf_token:
    raise ValueError("Hugging Face API token not found. Please set HF_API_TOKEN in .env file.")

def create_model_and_generate(prompt: str):
    # Create model directly
    model = models.llamacpp(
        repo_id="M4-ai/TinyMistral-248M-v2-Instruct-GGUF",
        filename="TinyMistral-248M-v2-Instruct.Q4_K_M.gguf",
    )

    try:
        # Create generator without parameters
        generator = generate.text(model)

        # Generate response with parameters at call time
        response = generator(
            prompt,
            max_tokens=100,
            temperature=0.7
        )
        return response

    finally:
        # Ensure model cleanup
        del model

# Use the function
prompt = "Write a short story about a cat."
result = create_model_and_generate(prompt)
print(result)

Key Improvements

Proper resource management using try/finally
Explicit cleanup with del model
Generator parameters moved to call time instead of creation time
Single function to handle model lifecycle

Testing

The code has been tested and no longer produces the NoneType error during cleanup.

Documentation Updates

This pattern follows the Outlines documentation for structured generation(1) which shows that parameters should be passed during generation, not during generator creation.

Fixes #[issue_number]

erlebach · 2024-11-23T16:52:35Z

My goal is to use the Outlines library on my local machine WITHOUT accessing the internet. So far, that is proving quite difficult. Could you please provide a model with Llama.cpp using regex for example, that works? Or with generate.text? I can use Option 1 below, but Option 2 does not work and should.

        response = model.create_completion(
            prompt,
            max_tokens=100,
            temperature=0.7,
            stream=False
        )
        return response['choices'][0]['text']
        """

        # Option 2: If you want to use Outlines generator (alternative)
        generator = generate.text(model)
        response = generator(prompt)
        return response

Here is the full code:

""
This script demonstrates how to use the Hugging Face API with a LLaMA model to generate text.

It loads the Hugging Face API token from a .env file, sets a random seed for reproducibility,
and defines a function to create a model and generate text based on a given prompt.

Usage:
  1. Ensure you have a .env file with your Hugging Face API token set as HF_API_TOKEN.
  2. Run the script to generate a text response based on the provided prompt.

Example:
  prompt = "Write a short story about a cat."
  result = create_model_and_generate(prompt)
  print(result)
"""

import os
from llama_cpp import Llama # E: Unable to import 'llama_cpp'
from outlines import models, generate
from dotenv import load_dotenv
import torch

# HF token is not required
load_dotenv()
hf_token = os.getenv("HF_API_TOKEN")
if not hf_token:
  raise ValueError("Hugging Face API token not found. Please set HF_API_TOKEN in your .env file.")

# Set random seed for reproducibility
torch.manual_seed(42)

def create_model_and_generate(prompt: str):
  # Create the llama model directly first
  llm = Llama(
      model_path="/Users/erlebach/data/llm_models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
      n_gpu_layers=-1
  )
  
  # Then create the Outlines model wrapper
  # model = models.LlamaCppModel(llm)

  # Create an Outlines model wrapper
  model = models.llamacpp(
      model=llm,  # Pass the existing model instance
      repo_id="local",  # Required but not used for local models
  )

  try:
      # Option 1: Use llama-cpp directly (recommended for local models)
      """
      response = model.create_completion(
          prompt,
          max_tokens=100,
          temperature=0.7,
          stream=False
      )
      return response['choices'][0]['text']
      """

      # Option 2: If you want to use Outlines generator (alternative)
      generator = generate.text(model)
      response = generator(prompt)
      return response

  finally:
      pass

# Use the function
prompt = "Write a short story about a cat."
result = create_model_and_generate(prompt)
print(result)

erlebach added the bug label Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot return a value from `models.llamacpp` #1281

Cannot return a value from `models.llamacpp` #1281

erlebach commented Nov 23, 2024

erlebach commented Nov 23, 2024

erlebach commented Nov 23, 2024

Cannot return a value from models.llamacpp #1281

Cannot return a value from models.llamacpp #1281

Comments

erlebach commented Nov 23, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

erlebach commented Nov 23, 2024

Title: Fix llama.cpp cleanup and generator usage pattern

Description

Changes

Key Improvements

Testing

Documentation Updates

erlebach commented Nov 23, 2024

Cannot return a value from `models.llamacpp` #1281

Cannot return a value from `models.llamacpp` #1281