Skip to content

This analysis examines a dataset of 10,000 movies from a movie database, revealing insights and trends in the industry. Notably, drama is the most popular genre, and factors like budget and popularity impact revenue. However, limited data, replaced null values, outliers, and correlation-causation considerations call for cautious interpretation.

Notifications You must be signed in to change notification settings

Aisha-Ojey/TMDB-Movie-Dataset-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Movie Industry Analysis - README

This project analyzes a dataset containing information about 10,000 movies collected from a movie database, including user ratings and revenue. The dataset consists of 10,866 rows and 21 columns, such as imdb_id, revenue, budget, vote_count, etc.

Conclusions

Genre Popularity: Drama is the most popular genre over the years, followed by comedy, thriller, and action, indicating audience preferences.

Factors Influencing Revenue: High budget, vote count, and popularity lead to higher revenue and increased profit, highlighting the importance of investment and marketing efforts.

Voting Average Decline: Over the years, the voting average of movies has reduced, which may signify changes in audience preferences or industry dynamics.

Movie Production: The industry has been producing more movies over the years, reflecting increased demand or market expansion.

Runtime Reduction: The average runtime of movies has significantly reduced over the years, possibly due to shifts in audience attention spans.

Consistent Profitability: Despite a reduction in vote average, the movie industry has been making consistent profits over the years, suggesting other factors play a vital role in revenue generation.

Limitations

Reduced Dataset: Due to null, invalid, and duplicate values, the analysis was based on only 3,850 observations out of the original 10,866, which might affect the representativeness of conclusions.

Null and Invalid Values: Zeros were used to replace null and invalid values in features like budget, revenue, and runtime, which could distort results.

Outliers: Presence of outliers, such as extremely short runtimes, may influence statistical measures and lead to biased interpretations.

Data Preprocessing: Insufficient information on data preprocessing, handling of missing values, and outlier treatment may affect the reliability of results.

Correlation vs. Causation: Conclusions are based on observed correlations and may not imply causation, considering potential confounding factors.

Before using the analysis results, it is essential to address the limitations and conduct further robustness checks to ensure the reliability of the findings.

About

This analysis examines a dataset of 10,000 movies from a movie database, revealing insights and trends in the industry. Notably, drama is the most popular genre, and factors like budget and popularity impact revenue. However, limited data, replaced null values, outliers, and correlation-causation considerations call for cautious interpretation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published