Merge branch 'main' into av/db

artefactory-skaff · Dec 19, 2023 · c8b758e · c8b758e
2 parents 72d8f88 + e895a98
commit c8b758e
Show file tree

Hide file tree

Showing 40 changed files with 4,719 additions and 229 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -8,7 +8,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ['3.8', '3.9', '3.10']
+        python-version: ['3.11']
 
     steps:
       - uses: actions/checkout@v2
@@ -20,6 +20,6 @@ jobs:
       - name: Install requirements
         run: |
           python -m pip install --upgrade pip
-          pip install -r requirements-developer.txt
+          pip install -r requirements.txt
       - name: Run Pre commit hook (formatting, linting & tests)
         run: pre-commit run --all-files --hook-stage pre-push --show-diff-on-failure
diff --git a/.github/workflows/deploy_docs.yaml b/.github/workflows/deploy_docs.yaml
@@ -19,7 +19,7 @@ jobs:
       - name: Install requirements
         run: |
           python -m pip install --upgrade pip
-          pip install -r requirements-developer.txt
+          pip install -r requirements.txt
       - name: Deploying MkDocs documentation
         run: |
           mkdocs build

diff --git a/.gitignore b/.gitignore
@@ -136,3 +136,4 @@ secrets/*
 
 # Mac OS
 .DS_Store
+data/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,40 @@
+repos:
+  - repo: "https://github.com/pre-commit/pre-commit-hooks"
+    rev: v4.4.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-toml
+      - id: check-yaml
+      - id: check-json
+      - id: check-added-large-files
+  - repo: local
+    hooks:
+      - id: black
+        name: Formatting (black)
+        entry: black
+        types: [python]
+        language: system
+      - id: isort
+        name: Sorting imports (isort)
+        entry: isort
+        types: [python]
+        language: system
+      - id: ruff
+        name: Linting (ruff)
+        entry: ruff
+        types: [python]
+        language: system
+      - id: nbstripout
+        name: Strip Jupyter notebook output (nbstripout)
+        entry: nbstripout
+        types: [file]
+        files: (.ipynb)$
+        language: system
+      - id: python-bandit-vulnerability-check
+        name: Security check (bandit)
+        entry: bandit
+        types: [python]
+        args: ["-c", "pyproject.toml"]
+        language: system
+exclude: ^(.svn|CVS|.bzr|.hg|.git|__pycache__|.tox|.ipynb_checkpoints|assets|tests/assets/|venv/|.venv/)
diff --git a/.streamlit/config.toml b/.streamlit/config.toml
@@ -0,0 +1,12 @@
+[theme]
+base="light"
+primaryColor="#dcd5bf"
+secondaryBackgroundColor="#ededed"
+textColor="#000000"
+
+[server]
+maxMessageSize = 10000
+maxUploadSize = 5
+
+[ui]
+hideTopBar = true
diff --git a/Makefile b/Makefile
@@ -0,0 +1,26 @@
+USE_CONDA ?= 1
+INSTALL_SCRIPT = install_with_conda.sh
+ifeq (false,$(USE_CONDA))
+	INSTALL_SCRIPT = install_with_venv.sh
+endif
+
+# help: help					- Display this makefile's help information
+.PHONY: help
+help:
+	@grep "^# help\:" Makefile | grep -v grep | sed 's/\# help\: //' | sed 's/\# help\://'
+
+# help: install					- Create a virtual environment and install dependencies
+.PHONY: install
+install:
+	@bash bin/$(INSTALL_SCRIPT)
+
+# help: install_precommit			- Install pre-commit hooks
+.PHONY: install_precommit
+install_precommit:
+	@pre-commit install -t pre-commit
+	@pre-commit install -t pre-push
+
+# help: format_code				- Run pre-commit on all files
+.PHONY: format_code
+format_code:
+	@pre-commit run -a
diff --git a/README.md b/README.md
@@ -2,75 +2,8 @@
 
 # skaff-rag-accelerator
 
-[![CI status](https://github.com/artefactory/skaff-rag-accelerator/actions/workflows/ci.yaml/badge.svg)](https://github.com/artefactory/skaff-rag-accelerator/actions/workflows/ci.yaml?query=branch%3Amain)
-[![Python Version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue.svg)]()
-
-[![Linting , formatting, imports sorting: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
-[![security: bandit](https://img.shields.io/badge/security-bandit-yellow.svg)](https://github.com/PyCQA/bandit)
-[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-informational?logo=pre-commit&logoColor=white)](https://github.com/artefactory/skaff-rag-accelerator/blob/main/.pre-commit-config.yaml)
-</div>
-
-TODO: if not done already, check out the [Skaff documentation](https://artefact.roadie.so/catalog/default/component/repo-builder-ds/docs/) for more information about the generated repository.
-
-Deploy RAGs quickly
-
-## Table of Contents
-
-- [skaff-rag-accelerator](#skaff-rag-accelerator)
-  - [Table of Contents](#table-of-contents)
-  - [Installation](#installation)
-  - [Usage](#usage)
-  - [Documentation](#documentation)
-  - [Repository Structure](#repository-structure)
-
-## Installation
-
-To install the required packages in a virtual environment, run the following command:
-
-```bash
-make install
-```
-
-TODO: Choose between conda and venv if necessary or let the Makefile as is and copy/paste the [MORE INFO installation section](MORE_INFO.md#eased-installation) to explain how to choose between conda and venv.
-
-A complete list of available commands can be found using the following command:
-
 ```bash
-make help
+export PYTHONPATH="/Users/sarah.lauzeral/Library/CloudStorage/[email protected]/Mon Drive/internal_projects/skaff-rag-accelerator/"
 ```
 
-## Usage
-
-TODO: Add usage instructions here
-
-## Documentation
-
-TODO: Github pages is not enabled by default, you need to enable it in the repository settings: Settings > Pages > Source: "Deploy from a branch" / Branch: "gh-pages" / Folder: "/(root)"
-
-A detailed documentation of this project is available [here](https://artefactory.github.io/skaff-rag-accelerator/)
-
-To serve the documentation locally, run the following command:
-
-```bash
-mkdocs serve
-```
-
-To build it and deploy it to GitHub pages, run the following command:
-
-```bash
-make deploy_docs
-```
-
-## Repository Structure
-
-```
-.
-├── .github    <- GitHub Actions workflows and PR template
-├── bin        <- Bash files
-├── config     <- Configuration files
-├── docs       <- Documentation files (mkdocs)
-├── lib        <- Python modules
-├── notebooks  <- Jupyter notebooks
-├── secrets    <- Secret files (ignored by git)
-└── tests      <- Unit tests
-```
+</div>
diff --git a/backend/_logs.py b/backend/_logs.py
@@ -0,0 +1,66 @@
+import os
+from typing import Any, Dict, List, Sequence
+
+import streamlit as st
+from langchain.callbacks.base import BaseCallbackHandler
+from langchain.schema.document import Document
+
+
+class StreamHandler(BaseCallbackHandler):
+    """StreamHandler is a class that handles the streaming of text.
+
+    It is a callback handler for a language model. \
+    It displays the generated text in a Streamlit container \
+    and handles the start of the language model and the generation of new tokens.
+    """
+
+    def __init__(
+        self, container: st.delta_generator.DeltaGenerator, initial_text: str = ""
+    ) -> None:
+        """Initialize the StreamHandler."""
+        self.container = container
+        self.text = initial_text
+        self.run_id_ignore_token = None
+
+    def on_llm_start(
+        self, serialized: dict, prompts: List[str], **kwargs: Dict[str, Any]  # noqa: ARG002
+    ) -> None:
+        """Handle the start of the language model."""
+        if "Question reformulée :" in prompts[0]:
+            self.run_id_ignore_token = kwargs.get("run_id")
+
+    def on_llm_new_token(self, token: str, **kwargs: Dict[str, Any]) -> None:
+        """Handle the generation of a new token by the language model."""
+        if self.run_id_ignore_token == kwargs.get("run_id", False):
+            return
+        self.text += token
+        self.container.markdown(self.text)
+
+
+class PrintRetrievalHandler(BaseCallbackHandler):
+    """PrintRetrievalHandler is a class that handles the retrieval of documents.
+
+    It is a callback handler for a document retriever. \
+    It displays the status and content of the retrieved documents in a Streamlit container.
+    """
+
+    def __init__(self, container: st.delta_generator.DeltaGenerator) -> None:
+        """Initialize the PrintRetrievalHandler."""
+        self.status = container.status("**Context Retrieval**")
+
+    def on_retriever_start(
+        self, serialized: Dict[str, Any], query: str, **kwargs: Dict[str, Any]  # noqa: ARG002
+    ) -> None:
+        """Handle the start of the document retrieval."""
+        self.status.write(f"**Question:** {query}")
+        self.status.update(label=f"**Context Retrieval:** {query}")
+
+    def on_retriever_end(
+        self, documents: Sequence[Document], **kwargs: Dict[str, Any]  # noqa: ARG002
+    ) -> None:
+        """Handle the end of the document retrieval."""
+        for idx, doc in enumerate(documents):
+            source = os.path.basename(doc.metadata["source"])  # noqa: PTH119
+            self.status.write(f"**Document {idx} from {source}**")
+            self.status.markdown(doc.page_content)
+        self.status.update(state="complete")
diff --git a/backend/authentication.py b/backend/authentication.py
@@ -0,0 +1,54 @@
+import os
+from datetime import datetime, timedelta
+from typing import Optional
+
+from jose import jwt
+from pydantic import BaseModel
+
+from database.database import Database
+
+SECRET_KEY = os.environ.get("SECRET_KEY", "default_unsecure_key")
+ALGORITHM = "HS256"
+
+
+class User(BaseModel):
+    """Represents a user with an email and password."""
+
+    email: str = None
+    password: str = None
+
+
+def create_user(user: User) -> None:
+    """Create a new user in the database."""
+    with Database() as connection:
+        connection.query(
+            "INSERT INTO user (email, password) VALUES (?, ?)", (user.email, user.password)
+        )
+
+
+def get_user(email: str) -> User:
+    """Retrieve a user from the database by email."""
+    with Database() as connection:
+        user_row = connection.query("SELECT * FROM user WHERE email = ?", (email,))[0]
+        for row in user_row:
+            return User(**row)
+        raise Exception("User not found")
+
+
+def authenticate_user(username: str, password: str) -> Optional[User]:
+    """Authenticate a user by their username and password."""
+    user = get_user(username)
+    if not user or not password == user.password:
+        return False
+    return user
+
+
+def create_access_token(*, data: dict, expires_delta: Optional[timedelta] = None) -> str:
+    """Create a JWT access token with optional expiry."""
+    to_encode = data.copy()
+    if expires_delta:
+        expire = datetime.utcnow() + expires_delta
+    else:
+        expire = datetime.utcnow() + timedelta(minutes=15)
+    to_encode.update({"exp": expire})
+    return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -136,3 +136,4 @@ secrets/*

		# Mac OS
		.DS_Store
		data/