GitHub - lizzypy/simple_recommendation_engine: None

Simple Recommendation Engine

This repository contains a simple movie recommendation engine. The development of this engine is used in articles and talks to demonstrate how test driven development and CI/CD can be applied to data analytics and ml models.

To use the notebooks/scripts in this repository you should create a virtual environment with python 3.9 (this is the latest python version that AWS Glue Supports)

To manage python environments I recommend pyenv (it's forked from rbenv):

You can get started with pyenv here: https://github.com/pyenv/pyenv#getting-pyenv

Once you have pyenv installed you can follow these steps. From the root of the project run:

pyenv install 3.9.15
pyenv local 3.9.15
pip3 install -r analysis/requirements.txt

Analysis

You should now be able to run the following command:

cd analysis/notebooks && jupyter lab

This should open the jupyter notebooks in the notebooks directory. You should be able to run the Movie Data Analysis.ipynb from start to finish without errors.

Tests

Navigating to run tests from the root you can run the following:

cd analysis && pytest

To run a single test from root:

cd analysis && pytest utils/tests/test_cleaning.py

Terraform

To build the resources needed to run the glue job in aws you should move into the tf directory then run:

terraform init
terraform plan
terraform apply

Notes:

The terraform holds its state remotely. You should create an aws bucket in your aws account named recommendation-engine-terraform-state
You should create a dynamo-db table named recommendation-engine-lock
- That table MUST have an id of LockID
You will need to change the name of your S3 bucket. S3 bucket names must be globally unique
You need to manually add the all_movies.csv from analysis/notebooks/input/all_movies.csv to your S3 bucket once it's created
You will need to run the ./upload_prepare.sh script manually if you don't let github run the step for you before using the script in AWS

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
.idea		.idea
analysis		analysis
tf		tf
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
upload_prepare.sh		upload_prepare.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Recommendation Engine

Analysis

Tests

Terraform

Notes:

About

Releases

Packages

Languages

lizzypy/simple_recommendation_engine

Folders and files

Latest commit

History

Repository files navigation

Simple Recommendation Engine

Analysis

Tests

Terraform

Notes:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages