Skip to content

Commit

Permalink
server: tests: add Multi users with total number of tokens to predict…
Browse files Browse the repository at this point in the history
… exceeds the KV Cache size
  • Loading branch information
phymbert committed Feb 22, 2024
1 parent f820e10 commit aa591ef
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions examples/server/tests/features/parallel.feature
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,33 @@ Feature: Parallel
| streaming | n_predict |
| disabled | 512 |
#| enabled | 512 | FIXME: phymbert: need to investigate why in aiohttp with streaming only one token is generated

Scenario: Multi users with total number of tokens to predict exceeds the KV Cache size #3969
Given a server listening on localhost:8080
And a model file stories260K.gguf
And 42 as server seed
And 2 slots
And 1024 KV cache size
Then the server is starting
Then the server is healthy
Given a prompt:
"""
Write a very long story about AI.
"""
And a prompt:
"""
Write another very long music lyrics.
"""
And a prompt:
"""
Write a very long poem.
"""
And a prompt:
"""
Write a very long joke.
"""
And 2048 max tokens to predict
Given concurrent completion requests
Then the server is busy
Then the server is idle
Then all prompts are predicted

0 comments on commit aa591ef

Please sign in to comment.