This project analyzes a dataset containing information about 10,000 movies collected from a movie database, including user ratings and revenue. The dataset consists of 10,866 rows and 21 columns, such as imdb_id, revenue, budget, vote_count, etc.
Genre Popularity: Drama is the most popular genre over the years, followed by comedy, thriller, and action, indicating audience preferences.
Factors Influencing Revenue: High budget, vote count, and popularity lead to higher revenue and increased profit, highlighting the importance of investment and marketing efforts.
Voting Average Decline: Over the years, the voting average of movies has reduced, which may signify changes in audience preferences or industry dynamics.
Movie Production: The industry has been producing more movies over the years, reflecting increased demand or market expansion.
Runtime Reduction: The average runtime of movies has significantly reduced over the years, possibly due to shifts in audience attention spans.
Consistent Profitability: Despite a reduction in vote average, the movie industry has been making consistent profits over the years, suggesting other factors play a vital role in revenue generation.
Reduced Dataset: Due to null, invalid, and duplicate values, the analysis was based on only 3,850 observations out of the original 10,866, which might affect the representativeness of conclusions.
Null and Invalid Values: Zeros were used to replace null and invalid values in features like budget, revenue, and runtime, which could distort results.
Outliers: Presence of outliers, such as extremely short runtimes, may influence statistical measures and lead to biased interpretations.
Data Preprocessing: Insufficient information on data preprocessing, handling of missing values, and outlier treatment may affect the reliability of results.
Correlation vs. Causation: Conclusions are based on observed correlations and may not imply causation, considering potential confounding factors.
Before using the analysis results, it is essential to address the limitations and conduct further robustness checks to ensure the reliability of the findings.