Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSSOC '24 : Updated Movie Recommendation using Python #581

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.

This file was deleted.

49 changes: 25 additions & 24 deletions Movie-Recommender-System using python/Readme.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,37 +1,38 @@
## Movie Recommendation System
# The-Movie-Cinema

### Data Set
[Data Set](https://github.com/Binal02Singh/ML-CaPsule/blob/Movie-Recommender-System/Movie-Recommender-System%20using%20python/data/movies%20(1).csv) : This data set has been taken from Kaggle.com.
![Python](https://img.shields.io/badge/Python-3.8-blueviolet)
![Framework](https://img.shields.io/badge/Framework-Flask-red)
![Frontend](https://img.shields.io/badge/Frontend-HTML/CSS/JS-green)
![API](https://img.shields.io/badge/API-TMDB-fcba03)

### Use
This feature will recommend movies on the basis of the searches they did.
This application provides all the details of the requested movie such as overview, genre, release date, rating, runtime, top cast, reviews, recommended movies, etc.

The details of the movies(title, genre, runtime, rating, poster, etc) are fetched using an API by TMDB, https://www.themoviedb.org/documentation/api, and using the IMDB id of the movie in the API, I did web scraping to get the reviews given by the user in the IMDB site using `beautifulsoup4` and performed sentiment analysis on those reviews.

### Dependencies
- numpy
- sklearn
- difflab
- pandas
## Link to the application

## APPROACH
The basic approach was to recommend movies on the basis of keywords rather than rating

- First of all we gathered all the data from kaggle website
If you can't find the movie you're searching for through auto-suggestions while typing, there's no need to worry. Simply type the name of the movie and press "enter". Even if you make some typos, it should still work fine.

- Then we kept all the important coloumns like genre, cast, crew, title.
And converted all the data inside them into readable format and in the cast and crew section removed most of the names and kept only a few, like main 3 actors in cast and only director in crew.
=
## How to get the API key?

- Now we merged all of the coloumns into a single tag.
<img width="350" alt="Screenshot 2022-03-14 at 11 22 36 AM" src="https://user-images.githubusercontent.com/72695669/158113146-32586f2f-7d1b-4e2a-8b2a-28352bd36b4b.png">
Create an account in https://www.themoviedb.org/, click on the `API` link from the left hand sidebar in your account settings and fill all the details to apply for API key. If you are asked for the website URL, just give "NA" if you don't have one. You will see the API key in your `API` sidebar once your request is approved.

- Since all the data is combined, we converted it into vector form using "sklearn.feature_extraction.text".
<img width="350" alt="Screenshot 2022-03-14 at 11 22 50 AM" src="https://user-images.githubusercontent.com/72695669/158113253-e3dd7b50-7012-4aa6-a046-fca31e377f8b.png">
## How to run the project?

1. Clone this repository in your local system.
2. Install all the libraries mentioned in the [requirements.txt](https://github.com/kishan0725/The-Movie-Cinema/blob/master/requirements.txt) file with the command `pip install -r requirements.txt`.
3. Replace YOUR_API_KEY at line no. 2 of `static/recommend.js` file.
4. Open your terminal/command prompt from your project directory and run the `main.py` file by executing the command `python main.py`.
5. Go to your browser and type `http://127.0.0.1:5000/` in the address bar.
6. Hurray! That's it.

- Finally, when the search is done by the user, this system prints the closest vector's movie name using "cosine_similarity"
<img width="606" alt="Screenshot 2022-03-14 at 11 23 04 AM" src="https://user-images.githubusercontent.com/72695669/158113297-cc79389b-3335-4c9e-8218-455fc74b31bc.png">

### Prediction
<img width="399" alt="Screenshot 2022-03-14 at 11 23 21 AM" src="https://user-images.githubusercontent.com/72695669/158113320-8ee72e83-db9b-4882-aeda-12eecd7bb8e4.png">
### Sources of the datasets

1. [IMDB 5000 Movie Dataset](https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset)
2. [The Movies Dataset](https://www.kaggle.com/rounakbanik/the-movies-dataset)
3. [List of movies in 2018](https://en.wikipedia.org/wiki/List_of_American_films_of_2018)
4. [List of movies in 2019](https://en.wikipedia.org/wiki/List_of_American_films_of_2019)
5. [List of movies in 2020](https://en.wikipedia.org/wiki/List_of_American_films_of_2020)

36,847 changes: 36,847 additions & 0 deletions Movie-Recommender-System using python/datasets/final_data.csv

Large diffs are not rendered by default.

36,988 changes: 36,988 additions & 0 deletions Movie-Recommender-System using python/datasets/main_data.csv

Large diffs are not rendered by default.

36,342 changes: 36,342 additions & 0 deletions Movie-Recommender-System using python/datasets/movie.csv

Large diffs are not rendered by default.

5,044 changes: 5,044 additions & 0 deletions Movie-Recommender-System using python/datasets/movie_metadata.csv

Large diffs are not rendered by default.

7,086 changes: 7,086 additions & 0 deletions Movie-Recommender-System using python/datasets/reviews.txt

Large diffs are not rendered by default.

156 changes: 156 additions & 0 deletions Movie-Recommender-System using python/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
import numpy as np
import pandas as pd
from flask import Flask, render_template, request
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import json
import bs4 as bs
import urllib.request
import pickle
import requests
from datetime import date, datetime

# load the nlp model and tfidf vectorizer from disk
filename = 'nlp_model.pkl'
clf = pickle.load(open(filename, 'rb'))
vectorizer = pickle.load(open('tranform.pkl','rb'))

# converting list of string to list (eg. "["abc","def"]" to ["abc","def"])
def convert_to_list(my_list):
my_list = my_list.split('","')
my_list[0] = my_list[0].replace('["','')
my_list[-1] = my_list[-1].replace('"]','')
return my_list

# convert list of numbers to list (eg. "[1,2,3]" to [1,2,3])
def convert_to_list_num(my_list):
my_list = my_list.split(',')
my_list[0] = my_list[0].replace("[","")
my_list[-1] = my_list[-1].replace("]","")
return my_list

def get_suggestions():
data = pd.read_csv('main_data.csv')
return list(data['movie_title'].str.capitalize())

app = Flask(__name__)

@app.route("/")
@app.route("/home")
def home():
suggestions = get_suggestions()
return render_template('home.html',suggestions=suggestions)

@app.route("/populate-matches",methods=["POST"])
def populate_matches():
# getting data from AJAX request
res = json.loads(request.get_data("data"));
movies_list = res['movies_list'];

movie_cards = {"https://image.tmdb.org/t/p/original"+movies_list[i]['poster_path'] if movies_list[i]['poster_path'] else "/static/movie_placeholder.jpeg": [movies_list[i]['title'],movies_list[i]['original_title'],movies_list[i]['vote_average'],datetime.strptime(movies_list[i]['release_date'], '%Y-%m-%d').year if movies_list[i]['release_date'] else "N/A", movies_list[i]['id']] for i in range(len(movies_list))}

return render_template('recommend.html',movie_cards=movie_cards);



@app.route("/recommend",methods=["POST"])
def recommend():
# getting data from AJAX request
title = request.form['title']
cast_ids = request.form['cast_ids']
cast_names = request.form['cast_names']
cast_chars = request.form['cast_chars']
cast_bdays = request.form['cast_bdays']
cast_bios = request.form['cast_bios']
cast_places = request.form['cast_places']
cast_profiles = request.form['cast_profiles']
imdb_id = request.form['imdb_id']
poster = request.form['poster']
genres = request.form['genres']
overview = request.form['overview']
vote_average = request.form['rating']
vote_count = request.form['vote_count']
rel_date = request.form['rel_date']
release_date = request.form['release_date']
runtime = request.form['runtime']
status = request.form['status']
rec_movies = request.form['rec_movies']
rec_posters = request.form['rec_posters']
rec_movies_org = request.form['rec_movies_org']
rec_year = request.form['rec_year']
rec_vote = request.form['rec_vote']
rec_ids = request.form['rec_ids']

# get movie suggestions for auto complete
suggestions = get_suggestions()

# call the convert_to_list function for every string that needs to be converted to list
rec_movies_org = convert_to_list(rec_movies_org)
rec_movies = convert_to_list(rec_movies)
rec_posters = convert_to_list(rec_posters)
cast_names = convert_to_list(cast_names)
cast_chars = convert_to_list(cast_chars)
cast_profiles = convert_to_list(cast_profiles)
cast_bdays = convert_to_list(cast_bdays)
cast_bios = convert_to_list(cast_bios)
cast_places = convert_to_list(cast_places)

# convert string to list (eg. "[1,2,3]" to [1,2,3])
cast_ids = convert_to_list_num(cast_ids)
rec_vote = convert_to_list_num(rec_vote)
rec_year = convert_to_list_num(rec_year)
rec_ids = convert_to_list_num(rec_ids)

# rendering the string to python string
for i in range(len(cast_bios)):
cast_bios[i] = cast_bios[i].replace(r'\n', '\n').replace(r'\"','\"')

for i in range(len(cast_chars)):
cast_chars[i] = cast_chars[i].replace(r'\n', '\n').replace(r'\"','\"')

# combining multiple lists as a dictionary which can be passed to the html file so that it can be processed easily and the order of information will be preserved
movie_cards = {rec_posters[i]: [rec_movies[i],rec_movies_org[i],rec_vote[i],rec_year[i],rec_ids[i]] for i in range(len(rec_posters))}

casts = {cast_names[i]:[cast_ids[i], cast_chars[i], cast_profiles[i]] for i in range(len(cast_profiles))}

cast_details = {cast_names[i]:[cast_ids[i], cast_profiles[i], cast_bdays[i], cast_places[i], cast_bios[i]] for i in range(len(cast_places))}

if(imdb_id != ""):
# web scraping to get user reviews from IMDB site
sauce = urllib.request.urlopen('https://www.imdb.com/title/{}/reviews?ref_=tt_ov_rt'.format(imdb_id)).read()
soup = bs.BeautifulSoup(sauce,'lxml')
soup_result = soup.find_all("div",{"class":"text show-more__control"})

reviews_list = [] # list of reviews
reviews_status = [] # list of comments (good or bad)
for reviews in soup_result:
if reviews.string:
reviews_list.append(reviews.string)
# passing the review to our model
movie_review_list = np.array([reviews.string])
movie_vector = vectorizer.transform(movie_review_list)
pred = clf.predict(movie_vector)
reviews_status.append('Positive' if pred else 'Negative')

# getting current date
movie_rel_date = ""
curr_date = ""
if(rel_date):
today = str(date.today())
curr_date = datetime.strptime(today,'%Y-%m-%d')
movie_rel_date = datetime.strptime(rel_date, '%Y-%m-%d')

# combining reviews and comments into a dictionary
movie_reviews = {reviews_list[i]: reviews_status[i] for i in range(len(reviews_list))}

# passing all the data to the html file
return render_template('recommend.html',title=title,poster=poster,overview=overview,vote_average=vote_average,
vote_count=vote_count,release_date=release_date,movie_rel_date=movie_rel_date,curr_date=curr_date,runtime=runtime,status=status,genres=genres,movie_cards=movie_cards,reviews=movie_reviews,casts=casts,cast_details=cast_details)

else:
return render_template('recommend.html',title=title,poster=poster,overview=overview,vote_average=vote_average,
vote_count=vote_count,release_date=release_date,movie_rel_date="",curr_date="",runtime=runtime,status=status,genres=genres,movie_cards=movie_cards,reviews="",casts=casts,cast_details=cast_details)


if __name__ == '__main__':
app.run(debug=True)
Loading