Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(config): increase rate limit to 1200, set 6000 for llama-cpp… #3482

Merged
merged 1 commit into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 14 additions & 13 deletions crates/http-api-bindings/src/rate_limit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,23 @@
use ratelimit::Ratelimiter;
use tabby_inference::{ChatCompletionStream, CompletionOptions, CompletionStream, Embedding};

fn new_rate_limiter(rpm: u64) -> anyhow::Result<Ratelimiter> {
fn new_rate_limiter(rpm: u64) -> Ratelimiter {

Check warning on line 14 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L14

Added line #L14 was not covered by tests
Ratelimiter::builder(rpm, Duration::from_secs(60))
.max_tokens(rpm)
.initial_available(rpm)
.build()
.map_err(|e| {
anyhow::anyhow!(
"Failed to create ratelimiter, please check the rate limit configuration: {}",
e,
)
})
.expect("Failed to create RateLimiter, please check the HttpModelConfig.rate_limit configuration")

Check warning on line 19 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L19

Added line #L19 was not covered by tests
wsxiaoys marked this conversation as resolved.
Show resolved Hide resolved
}

pub struct RateLimitedEmbedding {
embedding: Box<dyn Embedding>,
rate_limiter: Ratelimiter,
}

pub fn new_embedding(embedding: Box<dyn Embedding>, rpm: u64) -> impl Embedding {
pub fn new_embedding(embedding: Box<dyn Embedding>, request_per_minute: u64) -> impl Embedding {

Check warning on line 27 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L27

Added line #L27 was not covered by tests
RateLimitedEmbedding {
embedding,
rate_limiter: new_rate_limiter(rpm).unwrap(),
rate_limiter: new_rate_limiter(request_per_minute),

Check warning on line 30 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L30

Added line #L30 was not covered by tests
}
}

Expand All @@ -57,10 +52,13 @@
rate_limiter: Ratelimiter,
}

pub fn new_completion(completion: Box<dyn CompletionStream>, rpm: u64) -> impl CompletionStream {
pub fn new_completion(
completion: Box<dyn CompletionStream>,
request_per_minute: u64,
) -> impl CompletionStream {

Check warning on line 58 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L55-L58

Added lines #L55 - L58 were not covered by tests
RateLimitedCompletion {
completion,
rate_limiter: new_rate_limiter(rpm).unwrap(),
rate_limiter: new_rate_limiter(request_per_minute),

Check warning on line 61 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L61

Added line #L61 was not covered by tests
}
}

Expand All @@ -86,10 +84,13 @@
rate_limiter: Ratelimiter,
}

pub fn new_chat(completion: Box<dyn ChatCompletionStream>, rpm: u64) -> impl ChatCompletionStream {
pub fn new_chat(
completion: Box<dyn ChatCompletionStream>,
request_per_minute: u64,
) -> impl ChatCompletionStream {

Check warning on line 90 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L87-L90

Added lines #L87 - L90 were not covered by tests
RateLimitedChatStream {
completion,
rate_limiter: new_rate_limiter(rpm).unwrap(),
rate_limiter: new_rate_limiter(request_per_minute),

Check warning on line 93 in crates/http-api-bindings/src/rate_limit.rs

View check run for this annotation

Codecov / codecov/patch

crates/http-api-bindings/src/rate_limit.rs#L93

Added line #L93 was not covered by tests
}
}

Expand Down
12 changes: 11 additions & 1 deletion crates/llama-cpp-server/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
use serde::Deserialize;
use supervisor::LlamaCppSupervisor;
use tabby_common::{
config::{HttpModelConfigBuilder, LocalModelConfig, ModelConfig},
config::{HttpModelConfigBuilder, LocalModelConfig, ModelConfig, RateLimit, RateLimitBuilder},
registry::{parse_model_id, ModelRegistry, GGML_MODEL_PARTITIONED_PREFIX},
};
use tabby_inference::{ChatCompletionStream, CompletionOptions, CompletionStream, Embedding};
Expand Down Expand Up @@ -46,6 +46,7 @@

let config = HttpModelConfigBuilder::default()
.api_endpoint(Some(api_endpoint(server.port())))
.rate_limit(build_rate_limit_config())

Check warning on line 49 in crates/llama-cpp-server/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/llama-cpp-server/src/lib.rs#L49

Added line #L49 was not covered by tests
.kind("llama.cpp/embedding".to_string())
.build()
.expect("Failed to create HttpModelConfig");
Expand Down Expand Up @@ -95,6 +96,7 @@
async fn new_with_supervisor(server: Arc<LlamaCppSupervisor>) -> Self {
let config = HttpModelConfigBuilder::default()
.api_endpoint(Some(api_endpoint(server.port())))
.rate_limit(build_rate_limit_config())

Check warning on line 99 in crates/llama-cpp-server/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/llama-cpp-server/src/lib.rs#L99

Added line #L99 was not covered by tests
.kind("llama.cpp/completion".to_string())
.build()
.expect("Failed to create HttpModelConfig");
Expand Down Expand Up @@ -142,6 +144,7 @@
async fn new_with_supervisor(server: Arc<LlamaCppSupervisor>) -> Self {
let config = HttpModelConfigBuilder::default()
.api_endpoint(Some(api_endpoint(server.port())))
.rate_limit(build_rate_limit_config())

Check warning on line 147 in crates/llama-cpp-server/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/llama-cpp-server/src/lib.rs#L147

Added line #L147 was not covered by tests
.kind("openai/chat".to_string())
.model_name(Some("local".into()))
.build()
Expand Down Expand Up @@ -320,3 +323,10 @@
}
}
}

fn build_rate_limit_config() -> RateLimit {
RateLimitBuilder::default()
.request_per_minute(6000)
.build()
.expect("Failed to create RateLimit")
}

Check warning on line 332 in crates/llama-cpp-server/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/llama-cpp-server/src/lib.rs#L327-L332

Added lines #L327 - L332 were not covered by tests
3 changes: 1 addition & 2 deletions crates/tabby-common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,6 @@ pub struct HttpModelConfig {
#[builder(default)]
pub api_key: Option<String>,

#[builder(default)]
#[serde(default)]
pub rate_limit: RateLimit,

Expand Down Expand Up @@ -354,7 +353,7 @@ pub struct RateLimit {
impl Default for RateLimit {
fn default() -> Self {
Self {
request_per_minute: 600,
request_per_minute: 1200,
}
}
}
Expand Down
Loading