Why llamaCpp only support batchsize=1 in text generation #283

neozhang307 · 2024-10-08T01:13:32Z

I wonder why there is such a limitation in llamacpp?
if self.config.task == "text-generation":
if input_shapes["batch_size"] != 1:
raise ValueError("Batch size must be 1 for LlamaCpp text generation")

In '/optimum_benchmark/backends/llama_cpp/backend.py'

The text was updated successfully, but these errors were encountered:

IlyasMoutawwakil · 2024-10-08T08:06:31Z

@baptistecolle

baptistecolle · 2024-10-09T09:38:06Z

Hi,

Llama.cpp supports batch inference. However, we use a Python binding (llama-cpp-python) to interact with it, which currently does not support batch inference abetlen/llama-cpp-python#771.

Once this feature is added, we can remove the current restriction and start benchmarking with batches

neozhang307 · 2024-10-10T05:33:00Z

OK

neozhang307 closed this as completed Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why llamaCpp only support batchsize=1 in text generation #283

Why llamaCpp only support batchsize=1 in text generation #283

neozhang307 commented Oct 8, 2024

IlyasMoutawwakil commented Oct 8, 2024

baptistecolle commented Oct 9, 2024

neozhang307 commented Oct 10, 2024

Why llamaCpp only support batchsize=1 in text generation #283

Why llamaCpp only support batchsize=1 in text generation #283

Comments

neozhang307 commented Oct 8, 2024

IlyasMoutawwakil commented Oct 8, 2024

baptistecolle commented Oct 9, 2024

neozhang307 commented Oct 10, 2024