Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dusting the project off #39

Merged
merged 2 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .typos.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[default]
extend-ignore-identifiers-re = [
"mmaped",
"arange",
]
7 changes: 0 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,9 @@
</p>

[![Continuous integration](https://github.com/EricLBuehler/candle-vllm/actions/workflows/ci.yml/badge.svg)](https://github.com/EricLBuehler/candle-vllm/actions/workflows/ci.yml)
[![Discord server](https://dcbadge.vercel.app/api/server/FAeJRRJ8)](https://discord.gg/FAeJRRJ8)

Efficient, easy-to-use platform for inference and serving local LLMs including an OpenAI compatible API server.

PPlease see [mistral.rs](https://github.com/EricLBuehler/mistral.rs), efficient inference platform for many models, including quantized support. Additionally, it implements X-LoRA, recently released method [here](https://github.com/EricLBuehler/xlora). X-LoRA introduces a MoE inspired method to densely gate LoRA adapters powered by a model self-reflection forward pass.

**candle-vllm is flux, in breaking development and as such is currently unstable.**

## Features
- OpenAI compatible API server provided for serving LLMs.
- Highly extensible trait-based system to allow rapid implementation of new module pipelines,
Expand All @@ -23,8 +18,6 @@ PPlease see [mistral.rs](https://github.com/EricLBuehler/mistral.rs), efficient
- 7b
- 13b
- 70b
- Mistral
- 7b

## Examples
See [this folder](examples/) for some examples.
Expand Down
30 changes: 15 additions & 15 deletions kernels/build.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
use std::path::PathBuf;
use anyhow::Result;
use std::fs::read_to_string;
use std::fs::OpenOptions;
use std::io::prelude::*;
use anyhow::{Result};
use std::fs::read_to_string;
use std::path::PathBuf;

fn read_lines(filename: &str) -> Vec<String> {
let mut result = Vec::new();
Expand All @@ -29,27 +29,27 @@ fn main() -> Result<()> {
let kernel_dir = PathBuf::from("../kernels/");
let absolute_kernel_dir = std::fs::canonicalize(&kernel_dir).unwrap();

println!("cargo:rustc-link-search=native={}", absolute_kernel_dir.display());
println!("cargo:rustc-link-lib=pagedattention");
println!(
"cargo:rustc-link-search=native={}",
absolute_kernel_dir.display()
);
println!("cargo:rustc-link-lib=pagedattention");
println!("cargo:rustc-link-lib=dylib=cudart");

let contents = read_lines("src/lib.rs");
for line in contents {
if line == "pub mod ffi;" {
return Ok(())
return Ok(());
}
}
let mut file = OpenOptions::new()
.write(true)
.append(true)
.open("src/lib.rs")
.unwrap();
.write(true)
.append(true)
.open("src/lib.rs")
.unwrap();
//Expose paged attention interface to Rust
if let Err(e) = writeln!(file, "pub mod ffi;") {
anyhow::bail!(
"error while building dependencies: {:?}\n",
e,
)
anyhow::bail!("error while building dependencies: {:?}\n", e,)
} else {
Ok(())
}
Expand Down
2 changes: 1 addition & 1 deletion kernels/src/ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,4 @@ extern "C" {

dtype: u32,
);
}
}
6 changes: 4 additions & 2 deletions kernels/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
pub const COPY_BLOCKS_KERNEL: &str = include_str!(concat!(env!("OUT_DIR"), "/copy_blocks_kernel.ptx"));
pub const COPY_BLOCKS_KERNEL: &str =
include_str!(concat!(env!("OUT_DIR"), "/copy_blocks_kernel.ptx"));
pub const PAGEDATTENTION: &str = include_str!(concat!(env!("OUT_DIR"), "/pagedattention.ptx"));
pub const RESHAPE_AND_CACHE_KERNEL: &str = include_str!(concat!(env!("OUT_DIR"), "/reshape_and_cache_kernel.ptx"));
pub const RESHAPE_AND_CACHE_KERNEL: &str =
include_str!(concat!(env!("OUT_DIR"), "/reshape_and_cache_kernel.ptx"));
pub mod ffi;
Loading