UofT Course Recommendation and Planning Assistant using RAG

Welcome to the UofT Course Recommendation and Planning Assistant, a project aimed at helping University of Toronto students receive accurate course suggestions by leveraging a Retrieval-Augmented Generation (RAG) pipeline. This system combines the power of large language models (LLMs) with text retrieval techniques, utilizing domain-specific course data collected by our scraper and stored in MongoDB to enhance the relevance and precision of responses.

Inspiration

Large language models play an essential role in our daily lives, assisting in various fields of work. However, most generative AI models are trained on datasets that do not cover all the information available online. As a result, when users ask for information outside the AI's knowledge, it often responds with inaccurate or irrelevant content, frustrating users.

This project addresses this problem by providing a Retrieval-Augmented Generation (RAG) solution specifically for UofT students, helping them get accurate course suggestions. By integrating an LLM enhanced with a RAG pipeline, we ensure that course recommendations are based on the most up-to-date and relevant university-specific data.

Introduction

Retrieval-Augmented Generation (RAG) is a technique designed to enhance LLM performance by augmenting its knowledge with domain-specific data. This mitigates hallucination in generative models, where they produce irrelevant or inaccurate responses due to incomplete training data. In this project, we explored both state-of-the-art dense and sparse retrievers and figured that the choice of retriever based on our task is essential in the quality of the retrieved text.

The RAG pipeline consists of two primary phases:

Data Indexing and Encoding: During this phase, course descriptions are encoded into vectors by our encoder in the dense retriever, and stored in a database along side with its course tile and course code.
Data Retrieval and Generation: When a user makes a query, it is sent to the RAG pipeline. The RAG pipeline encodes the user query into its embedding, compares it with the embeddings for the course descriptions in the database, retrieves the top-k relevant descriptions, concat them with the user initial query, and finally feed them into the LLM. This additional information allows the model to generate accurate, context-aware responses.

In our project, we implement the RAG pipeline using Llama3 70b to enhance its ability to provide more precise and relevant suggestions to UofT students.

Key Features

Enhanced Generation with RAG: By integrating retrieval techniques, the system improves the LLM’s response accuracy, ensuring course suggestions are relevant and up-to-date.
Course and Program Scraper: A custom-built scraper fetches UofT course and program data, which is indexed and used as the corpus for retrieval.
MongoDB Integration: Stores course data efficiently and enables fast retrieval during queries.
LLM Integration: Uses Llama3 70b, a powerful large language model, to generate accurate and human-like course suggestions.

Technology Stack

Backend: Python, Flask
Database: MongoDB (stores indexed course data and program information)
NLP Models: Retrieval-Augmented Generation using Llama3 70b from the Hugging Face Transformers library
Frontend: ReactJS for the user interface
Deployment: Flask backend for serving RAG model API and queries

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
frontend/course-recommender		frontend/course-recommender
mongodb		mongodb
my_backend		my_backend
LLMQuery.py		LLMQuery.py
RAG.py		RAG.py
README.md		README.md
dense_retriever.py		dense_retriever.py
my_server.py		my_server.py
profiles.json		profiles.json
reqs.txt		reqs.txt
requirements.txt		requirements.txt
retriever.py		retriever.py
sample_response.json		sample_response.json
sample_response_with_retriever.json		sample_response_with_retriever.json
sample_response_with_retriever_with_persona_china.json		sample_response_with_retriever_with_persona_china.json
sample_response_with_retriever_with_persona_first_year.json		sample_response_with_retriever_with_persona_first_year.json
splitter.py		splitter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UofT Course Recommendation and Planning Assistant using RAG

Inspiration

Introduction

Key Features

Technology Stack

About

Releases

Packages

Languages

JerryZhao1025/2024-Hackathon-team1

Folders and files

Latest commit

History

Repository files navigation

UofT Course Recommendation and Planning Assistant using RAG

Inspiration

Introduction

Key Features

Technology Stack

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages