Skip to content

Commit

Permalink
server: tests:
Browse files Browse the repository at this point in the history
  * start the server at each scenario
  * split the features as each requires different server config
  • Loading branch information
phymbert committed Feb 21, 2024
1 parent 68b8d4e commit 6406208
Show file tree
Hide file tree
Showing 6 changed files with 197 additions and 173 deletions.
9 changes: 6 additions & 3 deletions examples/server/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@ Server tests scenario using [BDD](https://en.wikipedia.org/wiki/Behavior-driven_

### Run tests
1. Build the server
2. download a GGUF model: `./scripts/hf.sh --repo ggml-org/models --file tinyllamas/stories260K.gguf`
3. Start the test: `./tests.sh stories260K.gguf -ngl 23`
2. download required models:
1. `../../../scripts/hf.sh --repo ggml-org/models --file tinyllamas/stories260K.gguf`
3. Start the test: `./tests.sh`

To change the server path, use `LLAMA_SERVER_BIN_PATH` environment variable.

### Skipped scenario

Scenario must be annotated with `@llama.cpp` to be included in the scope.
Feature or Scenario must be annotated with `@llama.cpp` to be included in the scope.
`@bug` annotation aims to link a scenario with a GitHub issue.
4 changes: 4 additions & 0 deletions examples/server/tests/features/environment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

def after_scenario(context, scenario):
print("stopping server...")
context.server_process.kill()
49 changes: 49 additions & 0 deletions examples/server/tests/features/security.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
@llama.cpp
Feature: Security

Background: Server startup with an api key defined
Given a server listening on localhost:8080
And a model file stories260K.gguf
And a server api key llama.cpp
Then the server is starting

Scenario Outline: Completion with some user api key
Given a prompt test
And a user api key <api_key>
And 4 max tokens to predict
And a completion request with <api_error> api error

Examples: Prompts
| api_key | api_error |
| llama.cpp | no |
| llama.cpp | no |
| hackeme | raised |
| | raised |

Scenario Outline: OAI Compatibility
Given a system prompt test
And a user prompt test
And a model test
And 2 max tokens to predict
And streaming is disabled
And a user api key <api_key>
Given an OAI compatible chat completions request with <api_error> api error

Examples: Prompts
| api_key | api_error |
| llama.cpp | no |
| llama.cpp | no |
| hackme | raised |


Scenario Outline: CORS Options
When an OPTIONS request is sent from <origin>
Then CORS header <cors_header> is set to <cors_header_value>

Examples: Headers
| origin | cors_header | cors_header_value |
| localhost | Access-Control-Allow-Origin | localhost |
| web.mydomain.fr | Access-Control-Allow-Origin | web.mydomain.fr |
| origin | Access-Control-Allow-Credentials | true |
| web.mydomain.fr | Access-Control-Allow-Methods | POST |
| web.mydomain.fr | Access-Control-Allow-Headers | * |
128 changes: 20 additions & 108 deletions examples/server/tests/features/server.feature
Original file line number Diff line number Diff line change
@@ -1,127 +1,53 @@
@llama.cpp
Feature: llama.cpp server

Background: Server startup
Given a server listening on localhost:8080 with 2 slots, 42 as seed and llama.cpp as api key
Given a server listening on localhost:8080
And a model file stories260K.gguf
And a model alias tinyllama-2
And 42 as server seed
And 32 KV cache size
And 1 slots
And 32 server max tokens to predict
Then the server is starting
Then the server is healthy

@llama.cpp
Scenario: Health
When the server is healthy
Then the server is ready
And all slots are idle

@llama.cpp
Scenario Outline: Completion
Given a prompt <prompt>
And a user api key <api_key>
And <n_predict> max tokens to predict
And a completion request
Then <n_predict> tokens are predicted
And a completion request with no api error
Then <n_predicted> tokens are predicted with content: <content>

Examples: Prompts
| prompt | n_predict | api_key |
| I believe the meaning of life is | 128 | llama.cpp |
| Write a joke about AI | 512 | llama.cpp |
| say goodbye | 0 | |
| prompt | n_predict | content | n_predicted |
| I believe the meaning of life is | 8 | <space>going to read. | 8 |
| Write a joke about AI | 64 | tion came to the park. And all his friends were very scared and did not | 32 |

@llama.cpp
Scenario Outline: OAI Compatibility
Given a system prompt <system_prompt>
Given a model <model>
And a system prompt <system_prompt>
And a user prompt <user_prompt>
And a model <model>
And <max_tokens> max tokens to predict
And streaming is <enable_streaming>
And a user api key <api_key>
Given an OAI compatible chat completions request with an api error <api_error>
Then <max_tokens> tokens are predicted
Given an OAI compatible chat completions request with no api error
Then <n_predicted> tokens are predicted with content: <content>

Examples: Prompts
| model | system_prompt | user_prompt | max_tokens | enable_streaming | api_key | api_error |
| llama-2 | You are ChatGPT. | Say hello. | 64 | false | llama.cpp | none |
| codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 512 | true | llama.cpp | none |
| John-Doe | You are an hacker. | Write segfault code in rust. | 0 | true | hackme | raised |
| model | system_prompt | user_prompt | max_tokens | content | n_predicted | enable_streaming |
| llama-2 | Book | What is the best book | 8 | "Mom, what' | 8 | disabled |
| codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 64 | "Hey," said the bird.<LF>The bird was very happy and thanked the bird for hel | 32 | enabled |

@llama.cpp
Scenario: Multi users
Given a prompt:
"""
Write a very long story about AI.
"""
And a prompt:
"""
Write another very long music lyrics.
"""
And 32 max tokens to predict
And a user api key llama.cpp
Given concurrent completion requests
Then the server is busy
And all slots are busy
Then the server is idle
And all slots are idle
Then all prompts are predicted

@llama.cpp
Scenario: Multi users OAI Compatibility
Given a system prompt "You are an AI assistant."
And a model tinyllama-2
Given a prompt:
"""
Write a very long story about AI.
"""
And a prompt:
"""
Write another very long music lyrics.
"""
And 32 max tokens to predict
And streaming is enabled
And a user api key llama.cpp
Given concurrent OAI completions requests
Then the server is busy
And all slots are busy
Then the server is idle
And all slots are idle
Then all prompts are predicted

# FIXME: #3969 infinite loop on the CI, not locally, if n_prompt * n_predict > kv_size
@llama.cpp
Scenario: Multi users with total number of tokens to predict exceeds the KV Cache size
Given a prompt:
"""
Write a very long story about AI.
"""
And a prompt:
"""
Write another very long music lyrics.
"""
And a prompt:
"""
Write a very long poem.
"""
And a prompt:
"""
Write a very long joke.
"""
And 512 max tokens to predict
And a user api key llama.cpp
Given concurrent completion requests
Then the server is busy
And all slots are busy
Then the server is idle
And all slots are idle
Then all prompts are predicted


@llama.cpp
Scenario: Embedding
When embeddings are computed for:
"""
What is the capital of Bulgaria ?
"""
Then embeddings are generated


@llama.cpp
Scenario: OAI Embeddings compatibility
Given a model tinyllama-2
When an OAI compatible embeddings computation request for:
Expand All @@ -131,23 +57,9 @@ Feature: llama.cpp server
Then embeddings are generated


@llama.cpp
Scenario: Tokenize / Detokenize
When tokenizing:
"""
What is the capital of France ?
"""
Then tokens can be detokenize

@llama.cpp
Scenario Outline: CORS Options
When an OPTIONS request is sent from <origin>
Then CORS header <cors_header> is set to <cors_header_value>

Examples: Headers
| origin | cors_header | cors_header_value |
| localhost | Access-Control-Allow-Origin | localhost |
| web.mydomain.fr | Access-Control-Allow-Origin | web.mydomain.fr |
| origin | Access-Control-Allow-Credentials | true |
| web.mydomain.fr | Access-Control-Allow-Methods | POST |
| web.mydomain.fr | Access-Control-Allow-Headers | * |
Loading

0 comments on commit 6406208

Please sign in to comment.