-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c45aecd
commit 0aac010
Showing
21 changed files
with
356 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
name: CLI Llama.cpp Tests | ||
|
||
on: | ||
workflow_dispatch: | ||
push: | ||
branches: | ||
- main | ||
paths: | ||
- .github/workflows/test_cli_llama_cpp.yaml | ||
- "optimum_benchmark/**" | ||
- "docker/**" | ||
- "tests/**" | ||
- "setup.py" | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- .github/workflows/test_cli_llama_cpp.yaml | ||
- "optimum_benchmark/**" | ||
- "docker/**" | ||
- "tests/**" | ||
- "setup.py" | ||
|
||
concurrency: | ||
cancel-in-progress: true | ||
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} | ||
|
||
jobs: | ||
run_cli_llama_cpp_tests: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v3 | ||
|
||
- name: Set up Python 3.10 | ||
uses: actions/setup-python@v3 | ||
with: | ||
python-version: "3.10" | ||
|
||
- name: Install requirements | ||
run: | | ||
pip install --upgrade pip | ||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu | ||
pip install -e .[testing,lamma-cpp] | ||
- name: Run tests | ||
run: pytest -s -k "llama_cpp" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,3 +172,7 @@ work-in-progress/ | |
experiments/ | ||
amdsmi/ | ||
amd-* | ||
|
||
# Mac specific | ||
.DS_Store | ||
outputs/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
defaults: | ||
- benchmark | ||
- scenario: inference | ||
- launcher: inline | ||
- backend: llama_cpp | ||
- _base_ | ||
- _self_ | ||
|
||
name: llama_cpp_llama | ||
|
||
backend: | ||
device: mps | ||
model: nomic-ai/nomic-embed-text-v1.5-GGUF | ||
task: feature-extraction | ||
filename: nomic-embed-text-v1.5.Q4_0.gguf | ||
|
||
scenario: | ||
input_shapes: | ||
batch_size: 1 | ||
sequence_length: 256 | ||
vocab_size: 30000 | ||
type_vocab_size: 1 | ||
max_position_embeddings: 512 | ||
generate_kwargs: | ||
max_new_tokens: 100 | ||
min_new_tokens: 100 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
defaults: | ||
- benchmark | ||
- scenario: inference | ||
- launcher: inline | ||
- backend: llama_cpp | ||
- _base_ | ||
- _self_ | ||
|
||
name: llama_cpp_llama | ||
|
||
backend: | ||
device: mps | ||
model: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF | ||
task: text-generation | ||
filename: tinyllama-1.1b-chat-v1.0.Q4_0.gguf | ||
|
||
|
||
scenario: | ||
input_shapes: | ||
batch_size: 1 | ||
sequence_length: 256 | ||
vocab_size: 32000 | ||
generate_kwargs: | ||
max_new_tokens: 100 | ||
min_new_tokens: 100 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
defaults: | ||
- benchmark | ||
- scenario: inference | ||
- launcher: process # launcher: inline works, | ||
- backend: pytorch | ||
- _base_ | ||
- _self_ | ||
|
||
name: pytorch_bert | ||
|
||
# launcher: | ||
# start_method: spawn | ||
|
||
scenario: | ||
latency: true | ||
memory: true | ||
input_shapes: | ||
batch_size: 1 | ||
sequence_length: 128 | ||
|
||
backend: | ||
device: cpu | ||
no_weights: true | ||
model: bert-base-uncased | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
from tempfile import TemporaryDirectory | ||
from typing import Any, Dict, Tuple | ||
|
||
from llama_cpp import Llama | ||
|
||
from ..base import Backend | ||
from .config import LlamaCppConfig | ||
|
||
|
||
class LlamaCppBackend(Backend[LlamaCppConfig]): | ||
NAME: str = "llama_cpp" | ||
|
||
def __init__(self, config: LlamaCppConfig) -> None: | ||
super().__init__(config) | ||
|
||
if self.config.no_weights: | ||
self.logger.info("\t+ Loading no weights model") | ||
raise NotImplementedError("No weights model is not yet implemented") | ||
|
||
def load(self) -> None: | ||
self.logger.info("\t+ Creating backend temporary directory") | ||
self.tmpdir = TemporaryDirectory() | ||
self.logger.info("\t+ Loading pretrained model") | ||
self.load_model_from_pretrained() | ||
self.tmpdir.cleanup() | ||
|
||
def load_model_from_pretrained(self) -> None: | ||
""" | ||
Load the pretrained model from the given model name (normally GGUF, GGML) | ||
""" | ||
embedding = True if self.config.task == "feature-extraction" else False | ||
|
||
self.pretrained_model = Llama.from_pretrained( | ||
repo_id=self.config.model, # type: ignore | ||
filename=self.config.filename, | ||
verbose=False, | ||
echo=False, | ||
embedding=embedding, | ||
) # type: ignore | ||
|
||
def validate_task(self) -> None: | ||
if self.config.task not in ["text-generation"]: | ||
raise ValueError(f"Task {self.config.task} not supported by {self.NAME}") | ||
|
||
def prepare_inputs(self, inputs: Dict[str, Any]) -> Tuple[Dict[str, Any], Dict[str, Any]]: | ||
if self.config.task == "text-generation": | ||
if inputs["input_ids"].shape[0] != 1: | ||
raise ValueError("Batch size must be 1 for Llama.cpp text generation") | ||
|
||
inputs = super().prepare_inputs(inputs) | ||
inputs["tokens"] = inputs["input_ids"].squeeze() | ||
|
||
return inputs | ||
elif self.config.task == "feature-extraction": | ||
detokenized_batch = list(map(self.pretrained_model.detokenize, inputs["input_ids"])) | ||
decoded_batch = [x.decode("utf-8") for x in detokenized_batch] | ||
|
||
inputs["input_str"] = decoded_batch | ||
return inputs | ||
|
||
raise ValueError(f"Task {self.config.task} not supported by {self.NAME}") | ||
|
||
def forward(self, inputs: Dict[str, Any], kwargs: Dict[str, Any]) -> Any: | ||
""" | ||
Forward pass of the model\ | ||
Get the embeddings of the input tokens | ||
""" | ||
|
||
return self.pretrained_model.embed(inputs["input_str"]) | ||
|
||
def prefill(self, inputs: Dict[str, Any], kwargs: Dict[str, Any]) -> list[int]: | ||
""" | ||
Prefill the model with the input tokens | ||
We consider prefill as the time to first token, thus we evaluate the time it takes for the model to generate the first token | ||
""" | ||
|
||
next(self.pretrained_model.generate(tokens=inputs["tokens"])) | ||
return inputs | ||
|
||
def generate(self, inputs: Dict[str, Any], kwargs: Dict[str, Any]) -> list[int]: | ||
""" | ||
Generate new tokens from the pretrained model | ||
""" | ||
|
||
output = [] | ||
|
||
for token in self.pretrained_model.generate(tokens=inputs["tokens"]): | ||
output.append(token) | ||
if len(output) >= kwargs["max_new_tokens"]: | ||
break | ||
|
||
return output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
from dataclasses import dataclass | ||
from logging import getLogger | ||
from typing import Optional | ||
|
||
from ...import_utils import llama_cpp_version | ||
from ..config import BackendConfig | ||
|
||
LOGGER = getLogger("backend") | ||
|
||
|
||
def llama_cpp_model_kwargs(): | ||
return {"verbose": True} | ||
|
||
|
||
@dataclass | ||
class LlamaCppConfig(BackendConfig): | ||
name: str = "llama_cpp" | ||
version: Optional[str] = llama_cpp_version() | ||
_target_: str = "optimum_benchmark.backends.llama_cpp.backend.LlamaCppBackend" | ||
|
||
no_weights: bool = False | ||
library: str = "llama_cpp" | ||
filename: Optional[str] = None | ||
|
||
def __post_init__(self): | ||
super().__post_init__() | ||
|
||
self.device = self.device.lower() # type: ignore | ||
self.library = "llama_cpp" | ||
|
||
if self.device not in ["cuda", "mps", "cpu"]: | ||
raise ValueError(f"Llama.cpp Backend only supports 'cpu', 'mps' and 'cuda' devices, got {self.device}") | ||
|
||
LOGGER.warning("Llama.cpp automatically selects the device, ignoring the device parameter in the config.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.