This is a Python project to build a movie recommendation system using data extracted from a movie database API.
The project follows the guided blueprint provided by Ploomber, focusing on writing professional, modular, and well-documented code with thorough docstrings and exception handling within an OOP framework.
Additionally I have added a simple frontend using Streamlit. The entire application is containerized using Docker for easy setup and deployment.
- Description
- Requirements
- How to Run it
- Data
- Recommendation Methodology
- Modules
- Results
- Credits
- License
The project involves the following components:
- 🎬 Extracting movie data by calling TheMovieDB API
- 💾 Storing the data in a DuckDB database
- 📊 Performing exploratory data analysis with SQL in Jupyter Notebooks
- 🤖 Developing a movie recommendation system that uses TF-IDF and cosine similarity to generate reccomendations
- 🎞️ Takes a movie title as input and returns similar movie recommendations
- ⚙️ Packaging the notebooks and Python scripts into an end-to-end workflow using Ploomber
- ⚡ Building a FastAPI web application to serve the recommendation results via API
- 🐳 Dockerizing the application for easy deployment
- Python 3.10+ 🐍
- Poetry 📦
- DuckDB 🦆
- Jupyter 💻
- Pandas 🐼
- Scikit-Learn 🔬
- FastAPI ⚡️
- Docker 🐳
See the
pyproject.toml
file for the full list of dependencies.
Click me
Clone the repository
git clone https://github.com/MagnusS0/movie-rec-system.git
Navigate to the directory where you downloaded the repository
cd movie_rec_system
Remember to add your own API key to .env
docker-compose up --build
Remember to add your own API key to .env
- Make sure you have
Poetry
innstalled in your enviornment
pip install poetry
- Install dependencies
poetry lock
poetry install
- Build the pipline with
Ploomber
build
poetry run ploomber build
- Run the app
uvicorn app.app:app
- Run the frontend (optional)
Make sure you are in the right dir
frontend
streamlit run frontend_app.py
The data is extracted from TheMovieDB API and stored in a DuckDB database movies_data.duckdb. It contains information on movies like title, overview, genres, ratings, etc.
The main tables are:
- movies - contains movie info
- genres - contains genre definitions
- movie_genre_data - joins movies and genres into a single table
The movie recommendation system is built using TF-IDF (Term Frequency-Inverse Document Frequency) and cosine similarity. Essentily building a content filtering reccomendation system.
TF-IDF is used to convert the movie (overviews+ (genres*2))
into numerical vectors, representing the significance of specific terms in each movie’s overview.
Then, cosine similarity is computed between these vectors to determine the similarity between different movies.
Based on this similarity score, the system recommends movies that are most similar to the given input movie title.
frontend/frontend_app.py
contains the Streamlit application codeapp/app.py
- contains the FastAPI application codeapp/recommender.py
- generates movie recommendationsapp/recommenderhelper.py
- contains helper functions for the recommenderetl/extract.py
- extracts data from APIetl/eda.ipynb
- notebook for exploratory data analysisproducts/
- contains notebooks packaged by Ploombertests/
- contains tests for the application
Running the application provides movie recommendations in JSON format for a given movie title. It also returns metrics on the popularity, ratings, and vote count of the recommendations.
Sample Output:
{
"movie": "oppenheimer",
"recommendations": [
"schindler's list",
"resistance",
"to end all war: oppenheimer & the atomic bomb",
"midway",
"1917",
"emancipation",
"13 hours: the secret soldiers of benghazi",
"defiance",
"the imitation game",
"hacksaw ridge"
],
"metrics": {
"popularity": 373.829,
"vote_avg": 0.834,
"vote_count": 6699.44
}
}
This project was created by @MagnusS0
Guided by: Ploomber's Movie Recommendation Project
Powered by:
TheMovieDB API
Ploomber
FastAPI
DuckDB
Poetry
Docker
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
I have modified the original code/structure from Ploomber's blueprint, while keeping some parts the same. Thank you to Ploomber for making their blueprint openly available!