Is it possible to predict if a movie will earn in the top 25% of profits? If so, what features are useful in making this prediction. That is what this project aims to find out.
My goal was to see if I could predict if a film will be in the top 25% of profitability for the year, using factors that movie studios could consider when planning a movie project.
These are the specific research questions I aimed to answer:
- Is there a correlation between genre and profits?
- Is there a correlation between content rating and profits?
- Are some genres more profitable for specific content ratings?
- Does an increase in the number of Facebook likes for directors or actors correlate with increased profits?
- Is being in or directing a greater number of movies correlated with more profits?
- Are some topics correlated with more profits?
- Data Cleaning
- Exploratory Data Analysis
- Inferential Statistics
- Machine Learning:
- Linear Regression
- Decision Trees
- Python:
- pandas
- jupyter notebook
- sklearn
- nltk
- Clone this repo
- Raw Data is being kept here within this repo.
- You can access the data yourself by following the links below:
- Movie data set of a selction of 5000 movies from 1916 - 2016. It can be obtained from Kaggle.
- World wide inflation data from the World Bank covering years 1960 to present.
- Information about movie budget and revenue obtained using the The Movie Database api.
- Notebooks are being stored here: here
- You will need to install the following packages:
- Blog Post: Cleaning Data
- Blog Post: Exploratory Analysis
- Blog Post: Linear Regression
- Blog Post: Decision Trees
M. T. Lash and K. Zhao, “Early Predictions of Movie Success: The Who, What, and When of Profitability,” Journal of Management Information Systems, vol. 33, no. 3, pp. 874–903, Jul. 2016, doi: 10.1080/07421222.2016.1243969.
Q. I. Mahmud, N. Z. Shuchi, F. M. Tawsif, A. Mohaimen, and A. Tasnim, “A machine learning approach to predict movie revenue based on pre-released movie metadata,” Journal of Computer Science, vol. 16, no. 6, pp. 749–767, 2020, doi: 10.3844/JCSSP.2020.749.767.
Hollywood Movies Make a Profit by Stephen Follows
Why Do All Hollywood Movies Lose Money? by Alex MayyasiPriconomics.com
Coding for Entrepuneurs, 30 Days of Code You Tube Vide0
Coding for Entrepuneurs, 30 Days of CodeSource Code
World Bank Inflation Indicators
Dealing with List Values in Pandas Dataframes by Max Hillsdorf on Medium
Text Analysis & Feature Engineering with NLP by Mauro Di Pietro on Medium
Accelerate Your Exploratory Data Analysis With Pandas-Profiling bySukanta Roy
Tutorial: Exploratory Data Analysis (EDA) with Categorical Variablesby Erin Hoffman on Medium
How to Combine Oversampling and Undersampling for Imbalanced Classification by Jason Brownlee on Machine Learning Mastery
Mariann Beagrie