FileWise: Empowering Insights, Effortlessly!

This repository contains code for a chatbot that can answer user questions based on the content of a file. The chatbot supports PDF, plain text, and DOCX file formats.

Approach

The RAG module consists of two main phases: retrieval and generation. The retrieval phase retrieves relevant context from a knowledge document based on the user's question, and the generation phase uses a language model to generate a personalized answer using the retrieved knowledge. The goal is to create a chatbot that can accurately answer user questions from the provided knowledge document while preventing hallucination.

Features

Upload a file and ask questions about its content.
Process PDF files using PyPDF2 library.
Extract text from plain text and DOCX files using textract library.
Split text into smaller chunks for efficient processing using CharacterTextSplitter from langchain library.
Generate embeddings for text chunks using OpenAIEmbeddings from langchain library.
Build a knowledge base of text chunks using FAISS from langchain library.
Perform similarity search to find relevant documents based on user queries.
Utilize a question-answering model to generate answers using load_qa_chain from langchain library.
Display the generated answer to the user using Streamlit.

Working

Installation

Clone the repository:

git clone https://github.com/tknishh/FileWise.git

Navigate to the project directory:

cd FileWise

Install the dependencies:

pip install -r requirements.txt

Note: Make sure to update your OpenAI API key in .env file.

Run the application:

streamlit run app.py

Usage

Open the application in your browser by visiting http://localhost:8501 (or the address provided by Streamlit).
Click on the "Choose File" button to upload a file.
Once the file is uploaded, enter your question in the text input field.
The chatbot will process the file, search for relevant documents, and generate an answer.
The answer will be displayed below the text input field.

Acknowledgements

This project utilizes the following libraries and frameworks:

PyPDF2
textract
Streamlit
langchain

Assumptions

The knowledge document contains sufficient information to answer user questions.
The user questions are within the scope of the knowledge document.
The chatbot will be a text-based interface.
The chatbot will handle one user question at a time.

Future Scope

Improve retrieval performance by using more advanced models like DPR with passage re-ranking.
Explore different generation techniques, such as controlled text generation or leveraging pretraining on domain-specific data.
Enhance the chatbot's conversational abilities by incorporating dialogue management techniques and context tracking.
Deploy the chatbot as a web application or integrate it into existing chat platforms.
Incorporate feedback loops to continuously improve the chatbot's performance and address user queries.
Expand the knowledge base and keep it up to date with the latest information.

Author

@tknishh

Contact

For any inquiries, please email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
diagrams		diagrams
notebooks		notebooks
outputs		outputs
resources		resources
test		test
.env		.env
.gitignore		.gitignore
README.md		README.md
Writeup.docx		Writeup.docx
app.py		app.py
requirements.txt		requirements.txt
title_image.png		title_image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FileWise: Empowering Insights, Effortlessly!

Approach

Features

Working

Installation

Usage

Acknowledgements

Assumptions

Future Scope

Author

Contact

About

Releases

Packages

Languages

tknishh/FileWise

Folders and files

Latest commit

History

Repository files navigation

FileWise: Empowering Insights, Effortlessly!

Approach

Features

Working

Installation

Usage

Acknowledgements

Assumptions

Future Scope

Author

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages