callm

About

callm enables you to run Generative AI models (such as Large Language Models) directly on your hardware, offline.
Under the hood, callm heavily relies on the candle crate and is written in pure Rust 🦀.

Supported models

Model	Safetensors	GGUF (quantized)
Llama	✅	✅
Mistral	✅	✅
Phi3	✅	❌
Qwen2	✅	❌

Thread safety

While pipelines are safe to send between threads, callm has not undergone extensive testing for thread-safety.
Caution is advised.

Portability

callm is known to run on Linux and macOS, and has been tested on these platforms. While Windows has not been extensively tested, it is expected to work out-of-the-box without issues.

callm is still in an early development stage and is not production-ready yet.

Installation

Add callm to your dependencies:

$ cargo add callm

Enabling GPU Support

callm uses features to selectively enable support for GPU acceleration.

NVIDIA (CUDA)

Enable the cuda feature to include support for CUDA devices.

$ cargo add callm --features cuda

Apple (Metal)

Enable the metal feature to include support for Metal devices.

$ cargo add callm --features metal

Usage

callm uses builder pattern to create inference pipelines.

use callm::pipelines::PipelineText;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build pipeline
    let mut pipeline = PipelineText::builder()
        .with_location("/path/to/model")
        .build()?;

    // Run inference
    let text_completion = pipeline.run("Tell me a joke about Rust borrow checker")?;
    println!("{text_completion}");

    Ok(())
}

Customizing sampling parameters

Override default sampling parameters during pipeline build or afterwards.

use callm::pipelines::PipelineText;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build pipeline with custom sampling parameters
    let mut pipeline = PipelineText::builder()
        .with_location("/path/to/model")
        .with_temperature(0.65)
        .with_top_k(25)
        .build()?;

    // Adjust sampling parameters later on
    pipeline.set_seed(42);
    pipeline.set_top_p(0.3);

    // Run inference
    let text_completion = pipeline.run("Write an article about Pentium F00F bug")?;
    println!("{text_completion}");

    Ok(())
}

Instruction-following and Chat models

If the model you are loading includes a chat template, you can use conversation-style inference via run_chat(). It accepts a slice of tuples in the form: (MessageRole, String).

use callm::pipelines::PipelineText;
use callm::templates::MessageRole;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build pipeline
    let mut pipeline = PipelineText::builder()
        .with_location("/path/to/model")
        .with_temperature(0.1)
        .build()?;

    // Prepare conversation messages
    let messages = vec![
        (
            MessageRole::System,
            "You are impersonating Linus Torvalds.".to_string(),
        ),
        (
            MessageRole::User,
            "What is your opinion on Rust for Linux kernel development?".to_string(),
        ),
    ];

    // Run chat-style inference
    let assistant_response = pipeline.run_chat(&messages)?;
    println!("{assistant_response}");

    Ok(())
}

Documentation

Consult the documentation for a full API reference.
Several examples and tools can be found in a separate callm-demos repo.

Contributing

Thank you for your interest in contributing to callm!

As this project is still in its early stages, your help is invaluable. Here are some ways you can get involved:

Report issues: If you encounter any bugs or unexpected behavior, please file an issue on GitHub. This will help us track and fix problems.
Submit a pull request: If you'd like to contribute code, please fork the repository, make your changes, and submit a pull request. We'll review and merge your changes as soon as possible.
Help with documentation: If you have expertise in a particular area, please help us improve our documentation.

Thank you for your contributions! 💪

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

callm

About

Supported models

Thread safety

Portability

Installation

Enabling GPU Support

NVIDIA (CUDA)

Apple (Metal)

Usage

Customizing sampling parameters

Instruction-following and Chat models

Documentation

Contributing

About

Releases

Packages

Languages

License

MistApproach/callm

Folders and files

Latest commit

History

Repository files navigation

callm

About

Supported models

Thread safety

Portability

Installation

Enabling GPU Support

NVIDIA (CUDA)

Apple (Metal)

Usage

Customizing sampling parameters

Instruction-following and Chat models

Documentation

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages