The Artificial Intelligence Controller Interface (AICI) is a singular technology that enables developers to create custom controllers for Large Language Models (LLMs). These controllers, written in languages that compile to WebAssembly (Wasm), can constrain and direct the output of LLMs in real-time. By providing a flexible and secure way to incorporate custom logic into the token generation process, AICI opens up a world of possibilities for tailoring LLM output to specific use cases.
In this tutorial, we'll dive deep into AICI, exploring its architecture, key features, and practical applications. We'll walk through setting up a development environment, building and deploying controllers, and using AICI to control LLM output. By the end of this tutorial, you'll have a solid understanding of how to leverage AICI to create powerful, customized LLM applications.
Before we dive into the practical aspects of using AICI, let's take a moment to understand its architecture. AICI acts as an abstraction layer between the LLM inference engine and the controller, allowing them to communicate seamlessly while remaining independent.
graph LR
A[LLM Inference Engine] -- AICI Protocol --> B[AICI Runtime]
B -- Wasm --> C[Controller]
C -- Constraints --> B
B -- Tokens --> A
The LLM inference engine, such as rLLM or vLLM, communicates with the AICI runtime using the AICI protocol. The AICI runtime, in turn, hosts the controller, which is a Wasm module. The controller can impose constraints on the token generation process, and the AICI runtime relays the generated tokens back to the LLM inference engine.
This architecture offers several key benefits:
- Security: Controllers run in a sandboxed Wasm environment, preventing unauthorized access to system resources.
- Performance: Controllers execute on the CPU in parallel with the GPU-based LLM inference, minimizing overhead.
- Flexibility: Controllers can be written in any language that compiles to Wasm, allowing developers to leverage their existing skills.
To get started with AICI, you'll need to set up your development environment. This involves installing the necessary tools and dependencies for building and running AICI components.
- Rust: AICI is primarily written in Rust, so you'll need to install Rust and Cargo. Follow the official installation guide here.
- Python 3.11+: Some of the example controllers, like
pyctrl
, use Python. Make sure you have Python 3.11 or later installed. - Git: You'll need Git to clone the AICI repository.
Once you have the prerequisites in place, you can install the required dependencies. For example, on Ubuntu, you can use the following command: Linux
sudo apt-get install --assume-yes --no-install-recommends \
build-essential cmake ccache pkg-config libssl-dev libclang-dev clang llvm-dev git-lfs
```
**MacOs**
```bash
brew install git cmake ccache
Install Rust:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Install asm32-wasi Rust component:
rustup target add wasm32-wasi
Rust update:
rutsup update
Finally, install the required Python packages:
pip install pytest pytest-forked ujson posix_ipc numpy requests
With the development environment set up, you're ready to build and run AICI components.
AICI consists of several components, including the rLLM server, the AICI runtime, and controllers. Let's walk through building and running these components.
Clone the project:
git clone https://github.com/microsoft/aici
The rLLM server is an LLM inference engine that supports AICI. It comes in two flavors: rllm-cuda
, which uses libtorch and CUDA, and rllm-llamacpp
, which uses llama.cpp. For this tutorial, we'll focus on rllm-llamacpp
.
To build and run the rllm-llamacpp
server, navigate to the rllm/rllm-llamacpp
directory and run:
./server.sh phi2
This command will build and start the server with the phi2
model. You can pass other model names or paths to the script.
Once the server is running, you can verify its status by opening http://127.0.0.1:4242/v1/models
in your browser. You should see a JSON response confirming that the selected model is loaded.
AICI controllers are Wasm modules that contain custom logic for controlling LLM output. The AICI repository includes several example controllers, such as jsctrl
(JavaScript) and pyctrl
(Python).
In most cases, you won't need to build the controllers yourself. The rLLM server can automatically download the latest release of pyctrl
from GitHub. However, if you want to build a controller manually, you can do so using Cargo:
cd controllers/pyctrl
cargo build --release --target wasm32-wasi
This command will build the pyctrl
controller and generate a Wasm module in the target/wasm32-wasi/release
directory.
Now that you have the rLLM server running and the controllers ready, let's explore how to use AICI to control LLM output.
Suppose we want the LLM to generate a list of the five most popular types of vehicles. Traditionally, this would involve crafting a carefully worded prompt with specific instructions. However, with AICI, we can simplify the prompt and use code to enforce the desired format and constraints.
Here's an example Python script (list-of-five.py
) that demonstrates this:
import pyaici.server as aici
async def main():
prompt = "What are the most popular types of vehicles?\n"
await aici.FixedTokens(prompt)
marker = aici.Label()
for i in range(1,6):
await aici.FixedTokens(f"{i}.")
await aici.gen_text(stop_at = "\n")
await aici.FixedTokens("\n")
aici.set_var("result", marker.text_since())
aici.start(main())
Let's break down what this script does:
- We define the prompt as a simple question, without specifying the number of items or formatting.
- We use
aici.FixedTokens()
to send the prompt to the LLM. - We create a
marker
usingaici.Label()
to keep track of the generated tokens. - We start a loop that iterates five times:
- We send the list number (e.g., "1.") using
aici.FixedTokens()
. - We use
aici.gen_text()
to generate the vehicle name, stopping at a newline character.
- We send the list number (e.g., "1.") using
- We send a final newline character to separate the list from any subsequent text.
- We store the generated list in a
result
variable usingaici.set_var()
andmarker.text_since()
.
To run this script, use the following command:
./aici.sh run list-of-five.py
The output will look something like this:
[Response] What are the most popular types of vehicles?
1. Cars
2. Motorcycles
3. Bicycles
4. Trucks
5. Boats
As you can see, the LLM generated a well-formatted list of five vehicle types, exactly as we specified in the script.
Another powerful feature of AICI is the ability to constrain LLM output using formal grammars. This is particularly useful for generating structured text, such as code.
Here's an example script that uses a context-free grammar for the C programming language to generate a function that calculates the Fibonacci sequence:
import pyaici.server as aici
async def main():
c_grammar = open("aici_abi/grammars/c.y", "r").read()
prompt = "Please write a C function that recursively calculates the Fibonacci sequence up to n.\n"
await aici.FixedTokens(prompt)
await aici.gen_text(max_tokens = 200, yacc=c_grammar, set_var="code", stop_at='\n\n')
aici.start(main())
In this script:
- We load the C grammar from a file using
open()
andread()
. - We define the prompt asking for a C function that calculates the Fibonacci sequence.
- We use
aici.gen_text()
to generate the function, specifying the following parameters:max_tokens
: The maximum number of tokens to generate.yacc
: The context-free grammar to use for constraining the output.set_var
: The variable to store the generated code in.stop_at
: The string to stop generation at (in this case, two newline characters).
Running this script will generate a C function that looks something like this:
int fib(int n) {
if (n <= 1) {
return n;
} else {
return fib(n-1) + fib(n-2);
}
}
The generated code adheres to the structure and syntax of the C programming language, as defined by the provided grammar.
AICI is a powerful tool for controlling LLM output and creating customized LLM applications. By providing a flexible, secure, and performant way to incorporate custom logic into the token generation process, AICI opens up a world of possibilities for developers.
In this tutorial, we've explored the architecture of AICI, set up a development environment, built and deployed controllers, and used AICI to control LLM output in practical examples. With this knowledge, you're well-equipped to start building your own AICI-powered applications.
Remember, AICI is still a relatively new technology, and there's a lot of room for exploration and innovation. Don't be afraid to experiment with different controllers, grammars, and constraints to see what you can create.
Happy coding!