-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
40 changed files
with
4,719 additions
and
229 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -136,3 +136,4 @@ secrets/* | |
|
||
# Mac OS | ||
.DS_Store | ||
data/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
repos: | ||
- repo: "https://github.com/pre-commit/pre-commit-hooks" | ||
rev: v4.4.0 | ||
hooks: | ||
- id: trailing-whitespace | ||
- id: end-of-file-fixer | ||
- id: check-toml | ||
- id: check-yaml | ||
- id: check-json | ||
- id: check-added-large-files | ||
- repo: local | ||
hooks: | ||
- id: black | ||
name: Formatting (black) | ||
entry: black | ||
types: [python] | ||
language: system | ||
- id: isort | ||
name: Sorting imports (isort) | ||
entry: isort | ||
types: [python] | ||
language: system | ||
- id: ruff | ||
name: Linting (ruff) | ||
entry: ruff | ||
types: [python] | ||
language: system | ||
- id: nbstripout | ||
name: Strip Jupyter notebook output (nbstripout) | ||
entry: nbstripout | ||
types: [file] | ||
files: (.ipynb)$ | ||
language: system | ||
- id: python-bandit-vulnerability-check | ||
name: Security check (bandit) | ||
entry: bandit | ||
types: [python] | ||
args: ["-c", "pyproject.toml"] | ||
language: system | ||
exclude: ^(.svn|CVS|.bzr|.hg|.git|__pycache__|.tox|.ipynb_checkpoints|assets|tests/assets/|venv/|.venv/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
[theme] | ||
base="light" | ||
primaryColor="#dcd5bf" | ||
secondaryBackgroundColor="#ededed" | ||
textColor="#000000" | ||
|
||
[server] | ||
maxMessageSize = 10000 | ||
maxUploadSize = 5 | ||
|
||
[ui] | ||
hideTopBar = true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
USE_CONDA ?= 1 | ||
INSTALL_SCRIPT = install_with_conda.sh | ||
ifeq (false,$(USE_CONDA)) | ||
INSTALL_SCRIPT = install_with_venv.sh | ||
endif | ||
|
||
# help: help - Display this makefile's help information | ||
.PHONY: help | ||
help: | ||
@grep "^# help\:" Makefile | grep -v grep | sed 's/\# help\: //' | sed 's/\# help\://' | ||
|
||
# help: install - Create a virtual environment and install dependencies | ||
.PHONY: install | ||
install: | ||
@bash bin/$(INSTALL_SCRIPT) | ||
|
||
# help: install_precommit - Install pre-commit hooks | ||
.PHONY: install_precommit | ||
install_precommit: | ||
@pre-commit install -t pre-commit | ||
@pre-commit install -t pre-push | ||
|
||
# help: format_code - Run pre-commit on all files | ||
.PHONY: format_code | ||
format_code: | ||
@pre-commit run -a |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,75 +2,8 @@ | |
|
||
# skaff-rag-accelerator | ||
|
||
[![CI status](https://github.com/artefactory/skaff-rag-accelerator/actions/workflows/ci.yaml/badge.svg)](https://github.com/artefactory/skaff-rag-accelerator/actions/workflows/ci.yaml?query=branch%3Amain) | ||
[![Python Version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue.svg)]() | ||
|
||
[![Linting , formatting, imports sorting: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) | ||
[![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit) | ||
[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-informational?logo=pre-commit&logoColor=white)](https://github.com/artefactory/skaff-rag-accelerator/blob/main/.pre-commit-config.yaml) | ||
</div> | ||
|
||
TODO: if not done already, check out the [Skaff documentation](https://artefact.roadie.so/catalog/default/component/repo-builder-ds/docs/) for more information about the generated repository. | ||
|
||
Deploy RAGs quickly | ||
|
||
## Table of Contents | ||
|
||
- [skaff-rag-accelerator](#skaff-rag-accelerator) | ||
- [Table of Contents](#table-of-contents) | ||
- [Installation](#installation) | ||
- [Usage](#usage) | ||
- [Documentation](#documentation) | ||
- [Repository Structure](#repository-structure) | ||
|
||
## Installation | ||
|
||
To install the required packages in a virtual environment, run the following command: | ||
|
||
```bash | ||
make install | ||
``` | ||
|
||
TODO: Choose between conda and venv if necessary or let the Makefile as is and copy/paste the [MORE INFO installation section](MORE_INFO.md#eased-installation) to explain how to choose between conda and venv. | ||
|
||
A complete list of available commands can be found using the following command: | ||
|
||
```bash | ||
make help | ||
export PYTHONPATH="/Users/sarah.lauzeral/Library/CloudStorage/[email protected]/Mon Drive/internal_projects/skaff-rag-accelerator/" | ||
``` | ||
|
||
## Usage | ||
|
||
TODO: Add usage instructions here | ||
|
||
## Documentation | ||
|
||
TODO: Github pages is not enabled by default, you need to enable it in the repository settings: Settings > Pages > Source: "Deploy from a branch" / Branch: "gh-pages" / Folder: "/(root)" | ||
|
||
A detailed documentation of this project is available [here](https://artefactory.github.io/skaff-rag-accelerator/) | ||
|
||
To serve the documentation locally, run the following command: | ||
|
||
```bash | ||
mkdocs serve | ||
``` | ||
|
||
To build it and deploy it to GitHub pages, run the following command: | ||
|
||
```bash | ||
make deploy_docs | ||
``` | ||
|
||
## Repository Structure | ||
|
||
``` | ||
. | ||
├── .github <- GitHub Actions workflows and PR template | ||
├── bin <- Bash files | ||
├── config <- Configuration files | ||
├── docs <- Documentation files (mkdocs) | ||
├── lib <- Python modules | ||
├── notebooks <- Jupyter notebooks | ||
├── secrets <- Secret files (ignored by git) | ||
└── tests <- Unit tests | ||
``` | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
import os | ||
from typing import Any, Dict, List, Sequence | ||
|
||
import streamlit as st | ||
from langchain.callbacks.base import BaseCallbackHandler | ||
from langchain.schema.document import Document | ||
|
||
|
||
class StreamHandler(BaseCallbackHandler): | ||
"""StreamHandler is a class that handles the streaming of text. | ||
It is a callback handler for a language model. \ | ||
It displays the generated text in a Streamlit container \ | ||
and handles the start of the language model and the generation of new tokens. | ||
""" | ||
|
||
def __init__( | ||
self, container: st.delta_generator.DeltaGenerator, initial_text: str = "" | ||
) -> None: | ||
"""Initialize the StreamHandler.""" | ||
self.container = container | ||
self.text = initial_text | ||
self.run_id_ignore_token = None | ||
|
||
def on_llm_start( | ||
self, serialized: dict, prompts: List[str], **kwargs: Dict[str, Any] # noqa: ARG002 | ||
) -> None: | ||
"""Handle the start of the language model.""" | ||
if "Question reformulée :" in prompts[0]: | ||
self.run_id_ignore_token = kwargs.get("run_id") | ||
|
||
def on_llm_new_token(self, token: str, **kwargs: Dict[str, Any]) -> None: | ||
"""Handle the generation of a new token by the language model.""" | ||
if self.run_id_ignore_token == kwargs.get("run_id", False): | ||
return | ||
self.text += token | ||
self.container.markdown(self.text) | ||
|
||
|
||
class PrintRetrievalHandler(BaseCallbackHandler): | ||
"""PrintRetrievalHandler is a class that handles the retrieval of documents. | ||
It is a callback handler for a document retriever. \ | ||
It displays the status and content of the retrieved documents in a Streamlit container. | ||
""" | ||
|
||
def __init__(self, container: st.delta_generator.DeltaGenerator) -> None: | ||
"""Initialize the PrintRetrievalHandler.""" | ||
self.status = container.status("**Context Retrieval**") | ||
|
||
def on_retriever_start( | ||
self, serialized: Dict[str, Any], query: str, **kwargs: Dict[str, Any] # noqa: ARG002 | ||
) -> None: | ||
"""Handle the start of the document retrieval.""" | ||
self.status.write(f"**Question:** {query}") | ||
self.status.update(label=f"**Context Retrieval:** {query}") | ||
|
||
def on_retriever_end( | ||
self, documents: Sequence[Document], **kwargs: Dict[str, Any] # noqa: ARG002 | ||
) -> None: | ||
"""Handle the end of the document retrieval.""" | ||
for idx, doc in enumerate(documents): | ||
source = os.path.basename(doc.metadata["source"]) # noqa: PTH119 | ||
self.status.write(f"**Document {idx} from {source}**") | ||
self.status.markdown(doc.page_content) | ||
self.status.update(state="complete") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import os | ||
from datetime import datetime, timedelta | ||
from typing import Optional | ||
|
||
from jose import jwt | ||
from pydantic import BaseModel | ||
|
||
from database.database import Database | ||
|
||
SECRET_KEY = os.environ.get("SECRET_KEY", "default_unsecure_key") | ||
ALGORITHM = "HS256" | ||
|
||
|
||
class User(BaseModel): | ||
"""Represents a user with an email and password.""" | ||
|
||
email: str = None | ||
password: str = None | ||
|
||
|
||
def create_user(user: User) -> None: | ||
"""Create a new user in the database.""" | ||
with Database() as connection: | ||
connection.query( | ||
"INSERT INTO user (email, password) VALUES (?, ?)", (user.email, user.password) | ||
) | ||
|
||
|
||
def get_user(email: str) -> User: | ||
"""Retrieve a user from the database by email.""" | ||
with Database() as connection: | ||
user_row = connection.query("SELECT * FROM user WHERE email = ?", (email,))[0] | ||
for row in user_row: | ||
return User(**row) | ||
raise Exception("User not found") | ||
|
||
|
||
def authenticate_user(username: str, password: str) -> Optional[User]: | ||
"""Authenticate a user by their username and password.""" | ||
user = get_user(username) | ||
if not user or not password == user.password: | ||
return False | ||
return user | ||
|
||
|
||
def create_access_token(*, data: dict, expires_delta: Optional[timedelta] = None) -> str: | ||
"""Create a JWT access token with optional expiry.""" | ||
to_encode = data.copy() | ||
if expires_delta: | ||
expire = datetime.utcnow() + expires_delta | ||
else: | ||
expire = datetime.utcnow() + timedelta(minutes=15) | ||
to_encode.update({"exp": expire}) | ||
return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM) |
Oops, something went wrong.