DeepStack_Video_demo.mp4
- LangChain:
- Provides a robust framework for LLM integration, chaining prompts, and managing workflows..
- Python:
- Core language used for implementing the project, ensuring flexibility and integration with ML frameworks.
- Model:
mixtral-8x7b
(Inferenced using GroqAPI) - Used for generating summaries, identifying relationships, and classifying character types.
- ChromaDB: Stores and retrieves text embeddings for efficient context-based search.
- Embeddings: Generated using HuggingFace's
all-MiniLM-L6-v2
model for high-quality semantic understanding.
- RecursiveCharacterTextSplitter: Splits large text into manageable chunks with overlaps to ensure continuity of context.
-
Clone the repository:
git clone https://github.com/VivekShinde7/Character_Insight_Extractor_Using_LLM.git
cd Character_Insight_Extractor_Using_LLM
-
Set Up a Python Environment:
conda create -prefix ./env python=3.9 -y
conda activate ./env
-
Install Dependencies:
pip install -r requirements.txt
-
Set Up Environment Variables:
- Create a
.env
file in the project root:
HF_TOKEN = your_huggingface_token GROQ_API_KEY = your_groq_api_key
- Create a
- Use the
compute_embeddings.py
script to process your text and store embeddings in the vector database:python src/compute_embeddings.py data/
- Run the
get_character_info.py
script to extract character details:python src/get_character_info.py "<character_name>"
- Example
python src/get_character_info.py "Eliza"
- Output
- Integration with Neo4j: Enable graph-based relationship visualization to provide a clearer and more interactive representation of character connections.
- Fine-Tuning the LLM: Improve role classification and relationship detection accuracy by fine-tuning the LLM on a domain-specific dataset.