Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ramalama rag command #501

Merged
merged 1 commit into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,6 @@ help:
@echo
@echo " - make build-rag IMAGE=quay.io/ramalama/ramalama GPU=ramalama"
@echo
@echo "Build Docling Container Image"
@echo
@echo " - make build-docling IMAGE=quay.io/ramalama/ramalama GPU=ramalama"
@echo
@echo "Build docs"
@echo
@echo " - make docs"
Expand Down Expand Up @@ -100,10 +96,6 @@ build_multi_arch:
build-rag:
podman build --build-arg IMAGE=${IMAGE} --build-arg GPU=${GPU} -t ${IMAGE}-rag container-images/pragmatic

.PHONY: build-docling
build-docling:
podman build --build-arg IMAGE=${IMAGE} --build-arg CONTENT=docling --build-arg GPU=${GPU} -t ${IMAGE}-docling container-images/pragmatic

.PHONY: install-docs
install-docs: docs
make -C docs install
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh |
| [ramalama-perplexity(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-perplexity.1.md)| calculate perplexity for specified AI Model |
| [ramalama-pull(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-pull.1.md) | pull AI Model from Model registry to local storage |
| [ramalama-push(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-push.1.md) | push AI Model from local storage to remote registry |
| [ramalama-rag(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-rag.1.md) | generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image|
| [ramalama-rm(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-rm.1.md) | remove AI Model from local storage |
| [ramalama-run(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-run.1.md) | run specified AI Model as a chatbot |
| [ramalama-serve(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-serve.1.md) | serve REST API on specified AI Model |
Expand Down
41 changes: 41 additions & 0 deletions docs/ramalama-rag.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
% ramalama-rag 1

## NAME
ramalama\-rag - generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image

## SYNOPSIS
**ramalama rag** [options] [path ...] image

## DESCRIPTION
Generate RAG data from provided documents and convert into an OCI Image. This command uses a specific container image containing the docling
tool to convert the specified content into a RAG vector database. If the image does not exists locally RamaLama will pull the image
down and launch a container to process the data.

NOTE: this command does not work without a container engine.

positional arguments:
path Files/Directory containing PDF, DOCX, PPTX, XLSX, HTML, AsciiDoc & Markdown formatted files to be processed. Can be specified multiple times.
image OCI Image name to contain processed rag data

## OPTIONS

#### **--help**, **-h**
Print usage message

#### **--network-mode**=*none*
sets the configuration for network namespaces when handling RUN instructions

## EXAMPLES

```
$ ramalama rag https://arxiv.org/pdf/2408.09869 /tmp/pdf quay.io/rhatdan/myrag
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 68509.50it/s]
Neither CUDA nor MPS are available - defaulting to CPU. Note: This module is much faster with a GPU.
2024-12-04 13:49:07.372 ( 70.927s) [ 75AB6740] doc_normalisation.h:448 WARN| found new `other` type: checkbox-unselected
```

## SEE ALSO
**[ramalama(1)](ramalama.1.md)**

## HISTORY
Dec 2024, Originally compiled by Dan Walsh <[email protected]>
1 change: 1 addition & 0 deletions docs/ramalama.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ show RamaLama version
| [ramalama-perplexity(1)](ramalama-perplexity.1.md)| calculate the perplexity value of an AI Model |
| [ramalama-pull(1)](ramalama-pull.1.md) | pull AI Models from Model registries to local storage |
| [ramalama-push(1)](ramalama-push.1.md) | push AI Models from local storage to remote registries |
| [ramalama-rag(1)](ramalama-rag.1.md) | generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image |
| [ramalama-rm(1)](ramalama-rm.1.md) | remove AI Models from local storage |
| [ramalama-run(1)](ramalama-run.1.md) | run specified AI Model as a chatbot |
| [ramalama-serve(1)](ramalama-serve.1.md) | serve REST API on specified AI Model |
Expand Down
2 changes: 1 addition & 1 deletion install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ setup_ramalama() {
syspath="$syspath/ramalama"
$sudo install -m755 -d "$syspath"
$sudo install -m755 "$to_file" "$ramalama_bin"
local python_files=("cli.py" "gguf_parser.py" "huggingface.py" "model.py" \
local python_files=("cli.py" "rag.py" "gguf_parser.py" "huggingface.py" "model.py" \
"model_inspect.py" "ollama.py" "common.py" "__init__.py" \
"quadlet.py" "kube.py" "oci.py" "version.py" "shortnames.py" \
"toml_parser.py" "file.py" "http_client.py" "url.py" \
Expand Down
29 changes: 29 additions & 0 deletions ramalama/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import platform
import time
import ramalama.oci
import ramalama.rag

from ramalama.huggingface import Huggingface
from ramalama.common import (
Expand Down Expand Up @@ -250,6 +251,7 @@ def configure_subcommands(parser):
perplexity_parser(subparsers)
pull_parser(subparsers)
push_parser(subparsers)
rag_parser(subparsers)
rm_parser(subparsers)
run_parser(subparsers)
serve_parser(subparsers)
Expand Down Expand Up @@ -918,6 +920,33 @@ def version_parser(subparsers):
parser.set_defaults(func=print_version)


def rag_parser(subparsers):
parser = subparsers.add_parser(
"rag",
help="generate and convert retrieval augmented generation (RAG) data from provided documents into an OCI Image",
)
parser.add_argument(
"--network-mode",
type=str,
default="none",
help="set the network mode for the container",
)
parser.add_argument(
"PATH",
nargs="*",
help="""\
Files/Directory containing PDF, DOCX, PPTX, XLSX, HTML, AsciiDoc & Markdown
formatted files to be processed""",
)
parser.add_argument("IMAGE", help="OCI Image name to contain processed rag data")
parser.set_defaults(func=rag_cli)


def rag_cli(args):
rag = ramalama.rag.Rag(args.IMAGE)
rag.generate(args)


def rm_parser(subparsers):
parser = subparsers.add_parser("rm", help="remove AI Model from local storage")
parser.add_argument("--container", default=False, action="store_false", help=argparse.SUPPRESS)
Expand Down
78 changes: 78 additions & 0 deletions ramalama/rag.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import os
import subprocess
import tempfile

from ramalama.common import run_cmd


class Rag:
model = ""
target = ""

def __init__(self, target):
self.target = target

def build(self, source, target, args):
print(f"Building {target}...")
src = os.path.realpath(source)
base = os.path.basename(source)
contextdir = os.path.dirname(src)
cfile = f"""\
FROM scratch
COPY {base} /vector.db
"""
containerfile = tempfile.NamedTemporaryFile(prefix='RamaLama_Containerfile_', delete=True)
# Open the file for writing.
with open(containerfile.name, 'w') as c:
c.write(cfile)
imageid = (
run_cmd(
[
args.engine,
"build",
"--no-cache",
f"--network={args.network_mode}",
"-q",
"-t",
target,
"-f",
containerfile.name,
contextdir,
],
debug=args.debug,
)
.stdout.decode("utf-8")
.strip()
)
return imageid

def generate(self, args):
if not args.container:
raise KeyError("rag command requires a container. Can not be run with --nocontainer option.")
if not args.engine or args.engine == "":
raise KeyError("rag command requires a container. Can not be run without a container engine.")

# Default image with "-rag" append is used for building rag data.
s = args.image.split(":")
s[0] = s[0] + "-rag"
rag_image = ":".join(s)

exec_args = [args.engine, "run", "--rm"]
if args.network_mode != "":
exec_args += ["--network", args.network_mode]
for path in args.PATH:
if os.path.exists(path):
fpath = os.path.realpath(path)
rpath = os.path.relpath(path)
exec_args += ["-v", f"{fpath}:/docs/{rpath}:ro,z"]
vectordb = tempfile.NamedTemporaryFile(dir="", prefix='RamaLama_rag_', delete=True)
exec_args += ["-v", f"{vectordb.name}:{vectordb.name}:z"]
exec_args += [rag_image]
exec_args += ["pragmatic", "--indexing", "--path /docs/", f"milvus_file_path={vectordb.name}"]
try:
run_cmd(exec_args, debug=args.debug)
except subprocess.CalledProcessError as e:
raise e

print(self.build(vectordb.name, self.target, args))
os.remove(vectordb.name)
Loading