Skip to content

This project utilizes advanced Large Language Models (LLMs) and vector database technologies to extract structured information about characters from literary texts. It is designed to analyze a given text, identify key characters, and determine their summaries, relationships, and roles (e.g., Protagonist, Antagonist, or Side character)

Notifications You must be signed in to change notification settings

VivekShinde7/Character_Insight_Extractor_Using_LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Character Insight Extraction Using LLM (DeepStack Assignment)

Demo Video

DeepStack_Video_demo.mp4

Technologies Used

1. LangChain and Python

  • LangChain:
    • Provides a robust framework for LLM integration, chaining prompts, and managing workflows..
  • Python:
    • Core language used for implementing the project, ensuring flexibility and integration with ML frameworks.

2. Large Language Models (LLMs)

  • Model: mixtral-8x7b (Inferenced using GroqAPI)
  • Used for generating summaries, identifying relationships, and classifying character types.

3. Vector Databases

  • ChromaDB: Stores and retrieves text embeddings for efficient context-based search.
  • Embeddings: Generated using HuggingFace's all-MiniLM-L6-v2 model for high-quality semantic understanding.

4. Text Splitting

  • RecursiveCharacterTextSplitter: Splits large text into manageable chunks with overlaps to ensure continuity of context.

File Structure

file_structure

Installation

  1. Clone the repository:

    git clone https://github.com/VivekShinde7/Character_Insight_Extractor_Using_LLM.git
    cd Character_Insight_Extractor_Using_LLM
  2. Set Up a Python Environment:

     conda create -prefix ./env python=3.9 -y
    conda activate ./env
  3. Install Dependencies:

     pip install -r requirements.txt
  4. Set Up Environment Variables:

    • Create a .env file in the project root:
     HF_TOKEN = your_huggingface_token
     GROQ_API_KEY = your_groq_api_key

Usage

1. Preprocess the Text

  • Use the compute_embeddings.py script to process your text and store embeddings in the vector database:
    python src/compute_embeddings.py data/

2. Analyze a Character

  • Run the get_character_info.py script to extract character details:
    python src/get_character_info.py "<character_name>"
  • Example
    python src/get_character_info.py "Eliza"
  • Output output

Future Enhancements

  • Integration with Neo4j: Enable graph-based relationship visualization to provide a clearer and more interactive representation of character connections.
  • Fine-Tuning the LLM: Improve role classification and relationship detection accuracy by fine-tuning the LLM on a domain-specific dataset.

About

This project utilizes advanced Large Language Models (LLMs) and vector database technologies to extract structured information about characters from literary texts. It is designed to analyze a given text, identify key characters, and determine their summaries, relationships, and roles (e.g., Protagonist, Antagonist, or Side character)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages