Niketkumardheeryan · vinod-polinati · May 13, 2024 · May 13, 2024
diff --git a/.DS_Store b/.DS_Store
diff --git a/Movie-Recommender-System using python/Movie_Recommendation_System_using_Python.ipynb b/Movie-Recommender-System using python/Movie_Recommendation_System_using_Python.ipynb
diff --git a/Movie-Recommender-System using python/Readme.md b/Movie-Recommender-System using python/Readme.md
@@ -1,37 +1,38 @@
-## Movie Recommendation System
+# The-Movie-Cinema
 
-### Data Set
-[Data Set](https://github.com/Binal02Singh/ML-CaPsule/blob/Movie-Recommender-System/Movie-Recommender-System%20using%20python/data/movies%20(1).csv) : This data set has been taken from Kaggle.com.
+![Python](https://img.shields.io/badge/Python-3.8-blueviolet)
+![Framework](https://img.shields.io/badge/Framework-Flask-red)
+![Frontend](https://img.shields.io/badge/Frontend-HTML/CSS/JS-green)
+![API](https://img.shields.io/badge/API-TMDB-fcba03)
 
-### Use
-This feature will recommend movies on the basis of the searches they did.
+This application provides all the details of the requested movie such as overview, genre, release date, rating, runtime, top cast, reviews, recommended movies, etc.
 
+The details of the movies(title, genre, runtime, rating, poster, etc) are fetched using an API by TMDB, https://www.themoviedb.org/documentation/api, and using the IMDB id of the movie in the API, I did web scraping to get the reviews given by the user in the IMDB site using `beautifulsoup4` and performed sentiment analysis on those reviews.
 
-### Dependencies
-- numpy
-- sklearn
-- difflab
-- pandas
+## Link to the application
 
-## APPROACH
-The basic approach was to recommend movies on the basis of keywords rather than rating
 
-- First of all we gathered all the data from kaggle website
+If you can't find the movie you're searching for through auto-suggestions while typing, there's no need to worry. Simply type the name of the movie and press "enter". Even if you make some typos, it should still work fine.
 
-- Then we kept all the important coloumns like genre, cast, crew, title.
-And converted all the data inside them into readable format and in the cast and crew section removed most of the names and kept only a few, like main 3 actors in cast and only director in crew.
+=
+## How to get the API key?
 
-- Now we merged all of the coloumns into a single tag.
- <img width="350" alt="Screenshot 2022-03-14 at 11 22 36 AM" src="https://user-images.githubusercontent.com/72695669/158113146-32586f2f-7d1b-4e2a-8b2a-28352bd36b4b.png">
+Create an account in https://www.themoviedb.org/, click on the `API` link from the left hand sidebar in your account settings and fill all the details to apply for API key. If you are asked for the website URL, just give "NA" if you don't have one. You will see the API key in your `API` sidebar once your request is approved.
 
-- Since all the data is combined, we converted it into vector form using "sklearn.feature_extraction.text".
-  <img width="350" alt="Screenshot 2022-03-14 at 11 22 50 AM" src="https://user-images.githubusercontent.com/72695669/158113253-e3dd7b50-7012-4aa6-a046-fca31e377f8b.png">
+## How to run the project?
 
+1. Clone this repository in your local system.
+2. Install all the libraries mentioned in the [requirements.txt](https://github.com/kishan0725/The-Movie-Cinema/blob/master/requirements.txt) file with the command `pip install -r requirements.txt`.
+3. Replace YOUR_API_KEY at line no. 2 of `static/recommend.js` file.
+4. Open your terminal/command prompt from your project directory and run the `main.py` file by executing the command `python main.py`.
+5. Go to your browser and type `http://127.0.0.1:5000/` in the address bar.
+6. Hurray! That's it.
 
-- Finally, when the search is done by the user, this system prints the closest vector's movie name using "cosine_similarity"
-  <img width="606" alt="Screenshot 2022-03-14 at 11 23 04 AM" src="https://user-images.githubusercontent.com/72695669/158113297-cc79389b-3335-4c9e-8218-455fc74b31bc.png">
-
-### Prediction
-<img width="399" alt="Screenshot 2022-03-14 at 11 23 21 AM" src="https://user-images.githubusercontent.com/72695669/158113320-8ee72e83-db9b-4882-aeda-12eecd7bb8e4.png">
+### Sources of the datasets 
 
+1. [IMDB 5000 Movie Dataset](https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset)
+2. [The Movies Dataset](https://www.kaggle.com/rounakbanik/the-movies-dataset)
+3. [List of movies in 2018](https://en.wikipedia.org/wiki/List_of_American_films_of_2018)
+4. [List of movies in 2019](https://en.wikipedia.org/wiki/List_of_American_films_of_2019)
+5. [List of movies in 2020](https://en.wikipedia.org/wiki/List_of_American_films_of_2020)
 
diff --git a/Movie-Recommender-System using python/datasets/final_data.csv b/Movie-Recommender-System using python/datasets/final_data.csv
diff --git a/Movie-Recommender-System using python/datasets/main_data.csv b/Movie-Recommender-System using python/datasets/main_data.csv
diff --git a/Movie-Recommender-System using python/datasets/movie.csv b/Movie-Recommender-System using python/datasets/movie.csv
diff --git a/Movie-Recommender-System using python/datasets/movie_metadata.csv b/Movie-Recommender-System using python/datasets/movie_metadata.csv
diff --git a/Movie-Recommender-System using python/datasets/reviews.txt b/Movie-Recommender-System using python/datasets/reviews.txt
diff --git a/Movie-Recommender-System using python/main.py b/Movie-Recommender-System using python/main.py
@@ -0,0 +1,156 @@
+import numpy as np
+import pandas as pd
+from flask import Flask, render_template, request
+from sklearn.feature_extraction.text import CountVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
+import json
+import bs4 as bs
+import urllib.request
+import pickle
+import requests
+from datetime import date, datetime
+
+# load the nlp model and tfidf vectorizer from disk
+filename = 'nlp_model.pkl'
+clf = pickle.load(open(filename, 'rb'))
+vectorizer = pickle.load(open('tranform.pkl','rb'))
+
+# converting list of string to list (eg. "["abc","def"]" to ["abc","def"])
+def convert_to_list(my_list):
+    my_list = my_list.split('","')
+    my_list[0] = my_list[0].replace('["','')
+    my_list[-1] = my_list[-1].replace('"]','')
+    return my_list
+
+# convert list of numbers to list (eg. "[1,2,3]" to [1,2,3])
+def convert_to_list_num(my_list):
+    my_list = my_list.split(',')
+    my_list[0] = my_list[0].replace("[","")
+    my_list[-1] = my_list[-1].replace("]","")
+    return my_list
+
+def get_suggestions():
+    data = pd.read_csv('main_data.csv')
+    return list(data['movie_title'].str.capitalize())
+
+app = Flask(__name__)
+
+@app.route("/")
+@app.route("/home")
+def home():
+    suggestions = get_suggestions()
+    return render_template('home.html',suggestions=suggestions)
+
+@app.route("/populate-matches",methods=["POST"])
+def populate_matches():
+    # getting data from AJAX request
+    res = json.loads(request.get_data("data"));
+    movies_list = res['movies_list'];
+
+    movie_cards = {"https://image.tmdb.org/t/p/original"+movies_list[i]['poster_path'] if movies_list[i]['poster_path'] else "/static/movie_placeholder.jpeg": [movies_list[i]['title'],movies_list[i]['original_title'],movies_list[i]['vote_average'],datetime.strptime(movies_list[i]['release_date'], '%Y-%m-%d').year if movies_list[i]['release_date'] else "N/A", movies_list[i]['id']] for i in range(len(movies_list))}
+
+    return render_template('recommend.html',movie_cards=movie_cards);
+
+
+
+@app.route("/recommend",methods=["POST"])
+def recommend():
+    # getting data from AJAX request
+    title = request.form['title']
+    cast_ids = request.form['cast_ids']
+    cast_names = request.form['cast_names']
+    cast_chars = request.form['cast_chars']
+    cast_bdays = request.form['cast_bdays']
+    cast_bios = request.form['cast_bios']
+    cast_places = request.form['cast_places']
+    cast_profiles = request.form['cast_profiles']
+    imdb_id = request.form['imdb_id']
+    poster = request.form['poster']
+    genres = request.form['genres']
+    overview = request.form['overview']
+    vote_average = request.form['rating']
+    vote_count = request.form['vote_count']
+    rel_date = request.form['rel_date']
+    release_date = request.form['release_date']
+    runtime = request.form['runtime']
+    status = request.form['status']
+    rec_movies = request.form['rec_movies']
+    rec_posters = request.form['rec_posters']
+    rec_movies_org = request.form['rec_movies_org']
+    rec_year = request.form['rec_year']
+    rec_vote = request.form['rec_vote']
+    rec_ids = request.form['rec_ids']
+
+    # get movie suggestions for auto complete
+    suggestions = get_suggestions()
+
+    # call the convert_to_list function for every string that needs to be converted to list
+    rec_movies_org = convert_to_list(rec_movies_org)
+    rec_movies = convert_to_list(rec_movies)
+    rec_posters = convert_to_list(rec_posters)
+    cast_names = convert_to_list(cast_names)
+    cast_chars = convert_to_list(cast_chars)
+    cast_profiles = convert_to_list(cast_profiles)
+    cast_bdays = convert_to_list(cast_bdays)
+    cast_bios = convert_to_list(cast_bios)
+    cast_places = convert_to_list(cast_places)
+
+    # convert string to list (eg. "[1,2,3]" to [1,2,3])
+    cast_ids = convert_to_list_num(cast_ids)
+    rec_vote = convert_to_list_num(rec_vote)
+    rec_year = convert_to_list_num(rec_year)
+    rec_ids = convert_to_list_num(rec_ids)
+
+    # rendering the string to python string
+    for i in range(len(cast_bios)):
+        cast_bios[i] = cast_bios[i].replace(r'\n', '\n').replace(r'\"','\"')
+
+    for i in range(len(cast_chars)):
+        cast_chars[i] = cast_chars[i].replace(r'\n', '\n').replace(r'\"','\"') 
+
+    # combining multiple lists as a dictionary which can be passed to the html file so that it can be processed easily and the order of information will be preserved
+    movie_cards = {rec_posters[i]: [rec_movies[i],rec_movies_org[i],rec_vote[i],rec_year[i],rec_ids[i]] for i in range(len(rec_posters))}
+
+    casts = {cast_names[i]:[cast_ids[i], cast_chars[i], cast_profiles[i]] for i in range(len(cast_profiles))}
+
+    cast_details = {cast_names[i]:[cast_ids[i], cast_profiles[i], cast_bdays[i], cast_places[i], cast_bios[i]] for i in range(len(cast_places))}
+
+    if(imdb_id != ""):
+        # web scraping to get user reviews from IMDB site
+        sauce = urllib.request.urlopen('https://www.imdb.com/title/{}/reviews?ref_=tt_ov_rt'.format(imdb_id)).read()
+        soup = bs.BeautifulSoup(sauce,'lxml')
+        soup_result = soup.find_all("div",{"class":"text show-more__control"})
+
+        reviews_list = [] # list of reviews
+        reviews_status = [] # list of comments (good or bad)
+        for reviews in soup_result:
+            if reviews.string:
+                reviews_list.append(reviews.string)
+                # passing the review to our model
+                movie_review_list = np.array([reviews.string])
+                movie_vector = vectorizer.transform(movie_review_list)
+                pred = clf.predict(movie_vector)
+                reviews_status.append('Positive' if pred else 'Negative')
+
+        # getting current date
+        movie_rel_date = ""
+        curr_date = ""
+        if(rel_date):
+            today = str(date.today())
+            curr_date = datetime.strptime(today,'%Y-%m-%d')
+            movie_rel_date = datetime.strptime(rel_date, '%Y-%m-%d')
+
+        # combining reviews and comments into a dictionary
+        movie_reviews = {reviews_list[i]: reviews_status[i] for i in range(len(reviews_list))}     
+
+        # passing all the data to the html file
+        return render_template('recommend.html',title=title,poster=poster,overview=overview,vote_average=vote_average,
+            vote_count=vote_count,release_date=release_date,movie_rel_date=movie_rel_date,curr_date=curr_date,runtime=runtime,status=status,genres=genres,movie_cards=movie_cards,reviews=movie_reviews,casts=casts,cast_details=cast_details)
+
+    else:
+        return render_template('recommend.html',title=title,poster=poster,overview=overview,vote_average=vote_average,
+            vote_count=vote_count,release_date=release_date,movie_rel_date="",curr_date="",runtime=runtime,status=status,genres=genres,movie_cards=movie_cards,reviews="",casts=casts,cast_details=cast_details)
+
+
+if __name__ == '__main__':
+    app.run(debug=True)