-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc: add document loading notebook example
- Loading branch information
1 parent
0ca93d0
commit a233d1a
Showing
9 changed files
with
151 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This is an interactive example that will walk you through the initialization of a RAG and the basic embedding of a few documents." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from pathlib import Path\n", | ||
"import sys\n", | ||
"import os\n", | ||
"repo_root = Path(os.getcwd()).parent\n", | ||
"sys.path.append(str(repo_root))\n", | ||
"\n", | ||
"from backend.config import RagConfig\n", | ||
"from backend.rag_components.rag import RAG\n", | ||
"\n", | ||
"rag_config = RagConfig.from_yaml(repo_root / \"backend\" / \"config.yaml\")\n", | ||
"rag_config.database.database_url = f\"sqlite:////{repo_root}/database/rag.sqlite3\"\n", | ||
"\n", | ||
"rag = RAG(config=rag_config)\n", | ||
"\n", | ||
"print(\"LLM:\", rag.llm.__class__.__name__)\n", | ||
"print(\"Embedding model:\", rag.embeddings.__class__.__name__)\n", | ||
"print(\"Vector store:\", rag.vector_store.__class__.__name__)\n", | ||
"print(\"Retriever:\", rag.retriever.__class__.__name__)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here we transform our CSV into standalone embeddable documents that we will be able to feed the vector store.\n", | ||
"\n", | ||
"We generate one document for each line, and each document will contain header:value pairs for all the columns.\n", | ||
"\n", | ||
"This is a very simplistic example, but vector store data models can get more advanced to support more [powerful retreival methods.](https://python.langchain.com/docs/modules/data_connection/retrievers/)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_community.document_loaders.csv_loader import CSVLoader\n", | ||
"from langchain.vectorstores.utils import filter_complex_metadata\n", | ||
"\n", | ||
"\n", | ||
"data_sample_path = repo_root / \"examples\" / \"billionaires.csv\"\n", | ||
"\n", | ||
"loader = CSVLoader(\n", | ||
" file_path=str(data_sample_path),\n", | ||
" csv_args={\"delimiter\": \",\", \"quotechar\": '\"', \"escapechar\": \"\\\\\"},\n", | ||
" encoding=\"utf-8-sig\",\n", | ||
")\n", | ||
"\n", | ||
"raw_documents = loader.load()\n", | ||
"documents = filter_complex_metadata(raw_documents)\n", | ||
"documents[:5]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To load the docs in the vector store, we recommend using the `load_document` as it [indexes previously embedded docs](https://python.langchain.com/docs/modules/data_connection/indexing), making the process idempotent." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"rag.load_documents(documents)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |