Skip to content

Latest commit

 

History

History
257 lines (179 loc) · 11.7 KB

README.md

File metadata and controls

257 lines (179 loc) · 11.7 KB

Cellm

Cellm is an Excel extension that lets you use Large Language Models (LLMs) like ChatGPT in cell formulas.

What is Cellm?

Similar to Excel's =SUM() function that outputs the sum of a range of numbers, Cellm's =PROMPT() function outputs the AI response to a range of text.

For example, you can write =PROMPT(A1:A10, "Extract all person names mentioned in the text.") in a cell's formula and drag the cell to apply the prompt to many rows. Cellm is useful when you want to use AI for repetitive tasks that would normally require copy-pasting data in and out of a chat window many times.

Key features

This extension does one thing and one thing well.

  • Calls LLMs in formulas and returns short answers suitable for cells.
  • Supports models from Anthropic, Mistral, OpenAI, and Google as well as locally hosted models via Llamafiles, Ollama, or vLLM.

Example

Say you're reviewing medical studies and need to quickly identify papers relevant to your research. Here's how Cellm can help with this task:

720p.mp4

In this example, we copy the papers' titles and abstracts into Excel and write this prompt:

"If the paper studies diabetic neuropathy and stroke, return "Include", otherwise, return "Exclude"."

We then use autofill to apply the prompt to many papers. Simple and powerful.

A single paper is misclassified because the original inclusion and exclusion criteria were summarized in one sentence. This is a good example, however, because it shows that these models rely entirely on your input and can make mistakes.

Getting Started

Cellm must be built from source and installed via Excel. Follow the steps below.

Requirements

Build

  1. Clone this repository:

    git clone https://github.com/getcellm/cellm.git
  2. In your terminal, go into the root of the project directory:

    cd cellm
  3. Add your Anthropic API key. Rename src/Cellm/appsettings.Anthropic.json to src/Cellm/appsettings.Local.json and insert it. Example:

    {
      "AnthropicConfiguration": {
        "ApiKey": "YOUR_ANTHROPIC_APIKEY"
      }
    }

    Cellm uses Anthropic as the default model provider. You can also use models from OpenAI, Mistral, Google, or run models locally. See the appsettings.Local.*.json files for examples.

  4. Install dependencies:

    dotnet restore
  5. Build the project:

    dotnet build --configuration Release

Install

  1. In Excel, go to File > Options > Add-Ins.
  2. In the Manage drop-down menu, select Excel Add-ins and click Go....
  3. Click Browse... and select the Cellm-AddIn64.xll file in the bin/Release/net6.0-windows folder.
  4. Check the box next to Cellm and click OK.

Usage

Cellm provides the following functions:

PROMPT

PROMPT(cells: range, [instruction: range | instruction: string | temperature: double], [temperature: double]): string
  • cells (Required): A cell or a range of cells.
    • Context and (optionally) instructions. The model will use the cells as context and follow any instructions as long as they are present somewhere in the cells.
  • instructions (Optional): A cell, a range of cells, or a string.
    • The model will follow these instructions and ignore instructions in the cells of the first argument.
    • Default: Empty.
  • temperature (Optional): double.
    • A value between 0 and 1 that controls the balance between deterministic outputs and creative exploration. Lower values make the output more deterministic, higher values make it more random.
    • Default: 0. The model will almost always give you the same result.
  • Returns: string: The AI model's response.

Example usage:

  • =PROMPT(A1:D10, "Extract keywords") will use the selected range of cells as context and follow the instruction to extract keywords.
  • =PROMPT(A1:D10, "Extract keywords", 0.7) will use the selected range of cells as context, follow the instruction to extract keywords, and use a temperature of 0.7.
  • =PROMPT(A1:D10) will use the range of cells as context and follow instructions as long as they present somewhere in the cells.
  • =PROMPT(A1:D10, 0.7) will use the selected range of cells as context, follow any instruction within the cells, and use a temperature of 0.7.

PROMPTWITH

PROMPTWITH(providerAndModel: string or cell, cells: range, [instruction: range | instruction: string | temperature: double], [temperature: double]): string

Allows you to specify the model as the first argument.

  • providerAndModel (Required): A string on the form "provider/model".
    • Default: anthropic/claude-3-5-sonnet-20240620

Example usage:

  • =PROMPTWITH("openai/gpt-4o-mini", A1:D10, "Extract keywords") will extract keywords using OpenAI's GPT-4o mini model instead of the default model.

Use Cases

Cellm is useful for repetitive tasks on both structured and unstructured data. Here are some practical applications:

  1. Text Classification

    =PROMPT(B2, "Analyze the survey response. Categorize as 'Product', 'Service', 'Pricing', or 'Other'.")
    

    Use classification prompts to quickly categorize large volumes of e.g. open-ended survey responses.

  2. Model Comparison

    Make a sheet with user queries in the first column and provider/model pairs in the first row. Write this prompt in the cell B2:

    =PROMPTWITH(B$1,$A2,"Answer the question in column A")
    

    Drag the cell across the entire table to apply all models to all queries.

  3. Data Cleaning

    =PROMPT(E2, "Standardize the company name by removing any legal entity identifiers (e.g., Inc., LLC) and correcting common misspellings.")
    

    Useful for cleaning and standardizing messy datasets.

  4. Content Summarization

    =PROMPT(F2, "Provide a 2-sentence summary of the article in the context.")
    

    Great for quickly digesting large amounts of text data, such as news articles or research papers.

  5. Entity Extraction

    =PROMPT(G2, "Extract all person names mentioned in the text.")
    

    Useful for analyzing unstructured text data in fields like journalism, research, or customer relationship management.

  6. When Built-in Excel Functions Are Insufficient

    =PROMPT(A1, "Fix email formatting")
    

    Useful when an "auditor" inserts random spaces in a column with thousands of email adresses. Use a local model if you are worried about sending sensitive data to hosted models.

These use cases are starting points. Experiment with different instructions to find what works best for your data. It works best when combined with human judgment and expertise in your specific domain.

Run Models Locally

Requirements

Local LLMs

Cellm can run LLM models locally on your computer via Llamafiles, Ollama, or vLLM. This ensures none of your data ever leaves your machine. And it's free.

Cellm uses Gemma 2 2B model with 4-bit quantization by default. This clever little model runs fine on a CPU.

For Ollama and vLLM you will need docker, and for models larger than 3B you will need a GPU.

LLamafile

Llamafile is a stand-alone executable that is very easy to setup. Cellm will automatically download a Llamafile model and start a Llamafile server the first time you call =PROMPT().

To get started:

  1. Rename appsettings.Llamafile.json to appsettings.Local.json.
  2. Build and install Cellm.
  3. Run e.g. =PROMPT(A1, "Extract keywords") in a formula.
  4. Wait 5-10 min depending on your internet connection. The model will reply once it is ready.

This will use the Llama 3.2 1B model. To use other models, edit the appsettings file and rebuild.

Use appsettings.Llamafile.GPU.json to offload Llamafile inference to your NVIDIA or AMD GPU.

Ollama and vLLM

Ollama and vLLM are LLM inference servers for running models locally. Ollama is designed for easy of use and vLLM is designed to run models efficiently with high throughput. Both Ollama and vLLM are packaged up with docker compose files in the docker/ folder.

To get started, we recommend using Ollama with the Gemma 2 2B model:

  1. Rename appsettings.Ollama.json to appsettings.Local.json,
  2. Build and install Cellm.
  3. Run the following command in the docker/ directory:
    docker compose -f docker-compose.Ollama.yml up --detach
    docker compose -f docker-compose.Ollama.yml exec backend ollama pull gemma2:2b
    docker compose -f docker-compose.Ollama.yml down  // When you want to shut it down

If you want to speed up inference, you can use your GPU as well:

docker compose -f docker-compose.Ollama.yml -f docker-compose.Ollama.GPU.yml up --detach

A GPU is practically required if you want to use models larger than Gemma 2 2b.

If you want to speed up running many requests in parallel, you can use vLLM instead of Ollama. You must supply the docker compose file with a Huggingface API key either via an environment variable or editing the docker compose file directy. Look at the vLLM docker compose file for details. If you don't know what a Huggingface API key is, just use Ollama.

To start vLLM:

docker compose -f docker-compose.vLLM.GPU.yml up --detach

To use other Ollama models, pull another of the supported models. To use other vLLM models, change the "--model" argument to another Huggingface model.

Open WebUI in included in both Ollama and vLLM docker compose files so you can test the local model outside of Cellm. It is available at http://localhost:3000.

Dos and Don'ts

Do:

  • Experiment with different prompts to find the most effective instructions for your data.
  • Use cell references to dynamically change your prompts based on other data in your spreadsheet.
  • Use local models for sensitive and confidential dataa.
  • Refer to the cell data as "context" in your instructions.
  • Verify responses, especially for critical decisions or analyses. These models will make errors and rely entirely on your input, which may also contain errors.

Don't:

  • Don't use Cellm to compute sums, averages, and other numerical calculations. The current generation of LLMs are not designed for mathematical operations. Use Excel's existing functions instead.
  • Don't use cloud model providers to process sensitive or confidential data.
  • Don't use extremely long prompts or give Cellm complex tasks. A normal chat UI lets you have a back and forth conversation which is better for exploring complex topics.
  • Don't use Cellm for tasks that require up-to-date information beyond the AI model's knowledge cutoff date unless you provide the information as context.

Why did you make Cellm?

My girlfriend was writing a systematic review paper. She had to compare 7.500 papers against inclusion and exclusion criterias. I told her this was a great use case for LLMs but quickly realized that individually copying 7.500 papers in and out of chat windows was a total pain. This sparked the idea to make an AI tool to automate repetitive tasks for people like her who would rather avoid programming.

I think Cellm is really cool because it enables everyone to automate repetitive tasks with AI to a level that was previously available only to programmers.

License

Fair Core License, Version 1.0, Apache 2.0 Future License