Skip to content

Commit

Permalink
Merge branch 'main' into av/db
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexisVLRT committed Dec 19, 2023
2 parents 72d8f88 + e895a98 commit c8b758e
Show file tree
Hide file tree
Showing 40 changed files with 4,719 additions and 229 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10']
python-version: ['3.11']

steps:
- uses: actions/checkout@v2
Expand All @@ -20,6 +20,6 @@ jobs:
- name: Install requirements
run: |
python -m pip install --upgrade pip
pip install -r requirements-developer.txt
pip install -r requirements.txt
- name: Run Pre commit hook (formatting, linting & tests)
run: pre-commit run --all-files --hook-stage pre-push --show-diff-on-failure
2 changes: 1 addition & 1 deletion .github/workflows/deploy_docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- name: Install requirements
run: |
python -m pip install --upgrade pip
pip install -r requirements-developer.txt
pip install -r requirements.txt
- name: Deploying MkDocs documentation
run: |
mkdocs build
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,4 @@ secrets/*

# Mac OS
.DS_Store
data/
40 changes: 40 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
repos:
- repo: "https://github.com/pre-commit/pre-commit-hooks"
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-toml
- id: check-yaml
- id: check-json
- id: check-added-large-files
- repo: local
hooks:
- id: black
name: Formatting (black)
entry: black
types: [python]
language: system
- id: isort
name: Sorting imports (isort)
entry: isort
types: [python]
language: system
- id: ruff
name: Linting (ruff)
entry: ruff
types: [python]
language: system
- id: nbstripout
name: Strip Jupyter notebook output (nbstripout)
entry: nbstripout
types: [file]
files: (.ipynb)$
language: system
- id: python-bandit-vulnerability-check
name: Security check (bandit)
entry: bandit
types: [python]
args: ["-c", "pyproject.toml"]
language: system
exclude: ^(.svn|CVS|.bzr|.hg|.git|__pycache__|.tox|.ipynb_checkpoints|assets|tests/assets/|venv/|.venv/)
12 changes: 12 additions & 0 deletions .streamlit/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[theme]
base="light"
primaryColor="#dcd5bf"
secondaryBackgroundColor="#ededed"
textColor="#000000"

[server]
maxMessageSize = 10000
maxUploadSize = 5

[ui]
hideTopBar = true
26 changes: 26 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
USE_CONDA ?= 1
INSTALL_SCRIPT = install_with_conda.sh
ifeq (false,$(USE_CONDA))
INSTALL_SCRIPT = install_with_venv.sh
endif

# help: help - Display this makefile's help information
.PHONY: help
help:
@grep "^# help\:" Makefile | grep -v grep | sed 's/\# help\: //' | sed 's/\# help\://'

# help: install - Create a virtual environment and install dependencies
.PHONY: install
install:
@bash bin/$(INSTALL_SCRIPT)

# help: install_precommit - Install pre-commit hooks
.PHONY: install_precommit
install_precommit:
@pre-commit install -t pre-commit
@pre-commit install -t pre-push

# help: format_code - Run pre-commit on all files
.PHONY: format_code
format_code:
@pre-commit run -a
71 changes: 2 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,75 +2,8 @@

# skaff-rag-accelerator

[![CI status](https://github.com/artefactory/skaff-rag-accelerator/actions/workflows/ci.yaml/badge.svg)](https://github.com/artefactory/skaff-rag-accelerator/actions/workflows/ci.yaml?query=branch%3Amain)
[![Python Version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue.svg)]()

[![Linting , formatting, imports sorting: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-informational?logo=pre-commit&logoColor=white)](https://github.com/artefactory/skaff-rag-accelerator/blob/main/.pre-commit-config.yaml)
</div>

TODO: if not done already, check out the [Skaff documentation](https://artefact.roadie.so/catalog/default/component/repo-builder-ds/docs/) for more information about the generated repository.

Deploy RAGs quickly

## Table of Contents

- [skaff-rag-accelerator](#skaff-rag-accelerator)
- [Table of Contents](#table-of-contents)
- [Installation](#installation)
- [Usage](#usage)
- [Documentation](#documentation)
- [Repository Structure](#repository-structure)

## Installation

To install the required packages in a virtual environment, run the following command:

```bash
make install
```

TODO: Choose between conda and venv if necessary or let the Makefile as is and copy/paste the [MORE INFO installation section](MORE_INFO.md#eased-installation) to explain how to choose between conda and venv.

A complete list of available commands can be found using the following command:

```bash
make help
export PYTHONPATH="/Users/sarah.lauzeral/Library/CloudStorage/[email protected]/Mon Drive/internal_projects/skaff-rag-accelerator/"
```

## Usage

TODO: Add usage instructions here

## Documentation

TODO: Github pages is not enabled by default, you need to enable it in the repository settings: Settings > Pages > Source: "Deploy from a branch" / Branch: "gh-pages" / Folder: "/(root)"

A detailed documentation of this project is available [here](https://artefactory.github.io/skaff-rag-accelerator/)

To serve the documentation locally, run the following command:

```bash
mkdocs serve
```

To build it and deploy it to GitHub pages, run the following command:

```bash
make deploy_docs
```

## Repository Structure

```
.
├── .github <- GitHub Actions workflows and PR template
├── bin <- Bash files
├── config <- Configuration files
├── docs <- Documentation files (mkdocs)
├── lib <- Python modules
├── notebooks <- Jupyter notebooks
├── secrets <- Secret files (ignored by git)
└── tests <- Unit tests
```
</div>
66 changes: 66 additions & 0 deletions backend/_logs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import os
from typing import Any, Dict, List, Sequence

import streamlit as st
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema.document import Document


class StreamHandler(BaseCallbackHandler):
"""StreamHandler is a class that handles the streaming of text.
It is a callback handler for a language model. \
It displays the generated text in a Streamlit container \
and handles the start of the language model and the generation of new tokens.
"""

def __init__(
self, container: st.delta_generator.DeltaGenerator, initial_text: str = ""
) -> None:
"""Initialize the StreamHandler."""
self.container = container
self.text = initial_text
self.run_id_ignore_token = None

def on_llm_start(
self, serialized: dict, prompts: List[str], **kwargs: Dict[str, Any] # noqa: ARG002
) -> None:
"""Handle the start of the language model."""
if "Question reformulée :" in prompts[0]:
self.run_id_ignore_token = kwargs.get("run_id")

def on_llm_new_token(self, token: str, **kwargs: Dict[str, Any]) -> None:
"""Handle the generation of a new token by the language model."""
if self.run_id_ignore_token == kwargs.get("run_id", False):
return
self.text += token
self.container.markdown(self.text)


class PrintRetrievalHandler(BaseCallbackHandler):
"""PrintRetrievalHandler is a class that handles the retrieval of documents.
It is a callback handler for a document retriever. \
It displays the status and content of the retrieved documents in a Streamlit container.
"""

def __init__(self, container: st.delta_generator.DeltaGenerator) -> None:
"""Initialize the PrintRetrievalHandler."""
self.status = container.status("**Context Retrieval**")

def on_retriever_start(
self, serialized: Dict[str, Any], query: str, **kwargs: Dict[str, Any] # noqa: ARG002
) -> None:
"""Handle the start of the document retrieval."""
self.status.write(f"**Question:** {query}")
self.status.update(label=f"**Context Retrieval:** {query}")

def on_retriever_end(
self, documents: Sequence[Document], **kwargs: Dict[str, Any] # noqa: ARG002
) -> None:
"""Handle the end of the document retrieval."""
for idx, doc in enumerate(documents):
source = os.path.basename(doc.metadata["source"]) # noqa: PTH119
self.status.write(f"**Document {idx} from {source}**")
self.status.markdown(doc.page_content)
self.status.update(state="complete")
54 changes: 54 additions & 0 deletions backend/authentication.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import os
from datetime import datetime, timedelta
from typing import Optional

from jose import jwt
from pydantic import BaseModel

from database.database import Database

SECRET_KEY = os.environ.get("SECRET_KEY", "default_unsecure_key")
ALGORITHM = "HS256"


class User(BaseModel):
"""Represents a user with an email and password."""

email: str = None
password: str = None


def create_user(user: User) -> None:
"""Create a new user in the database."""
with Database() as connection:
connection.query(
"INSERT INTO user (email, password) VALUES (?, ?)", (user.email, user.password)
)


def get_user(email: str) -> User:
"""Retrieve a user from the database by email."""
with Database() as connection:
user_row = connection.query("SELECT * FROM user WHERE email = ?", (email,))[0]
for row in user_row:
return User(**row)
raise Exception("User not found")


def authenticate_user(username: str, password: str) -> Optional[User]:
"""Authenticate a user by their username and password."""
user = get_user(username)
if not user or not password == user.password:
return False
return user


def create_access_token(*, data: dict, expires_delta: Optional[timedelta] = None) -> str:
"""Create a JWT access token with optional expiry."""
to_encode = data.copy()
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(minutes=15)
to_encode.update({"exp": expire})
return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
Loading

0 comments on commit c8b758e

Please sign in to comment.