IMDb Data Analysis Project

This project analyzes a dataset from IMDb that includes basic information about names in the film industry. The script downloads, decompresses, and processes the data to visualize the distribution of birth years and the prevalence of primary professions.

Prerequisites

To run this script, ensure that Python is installed on your machine along with the following libraries:

Pandas
Matplotlib
Seaborn
Requests

You can install these packages using pip:

pip install pandas matplotlib seaborn requests

Running the Script

To execute the script, simply run it using Python. The script will automatically download and decompress the necessary dataset from IMDb:

python imdb_analysis.py

Workflow

The script follows these steps:

Download Data: Download the compressed .gz file from IMDb's dataset repository.
Decompress Data: Decompress the .gz file to extract the TSV data.
Load and Clean Data: Read the TSV file, handle data types and clean missing values.
Data Analysis: Analyze the distribution of birth years.
Data Visualization:
- Plot the distribution of birth years.
- Plot the frequency of the top 10 primary professions.

Workflow Diagram

flowchart TD;
    A[Start] --> B[Download Data];
    B --> C[Decompress Data];
    C --> D[Load and Clean Data];
    D --> E[Analyze Data];
    E --> F[Visualize Birth Year Distribution];
    E --> G[Visualize Top Professions];
    F --> H[End];
    G --> H;

Visualizations

The script generates two primary visualizations:

Histogram of Birth Years: Displays the distribution of birth years among the individuals in the IMDb dataset.
Bar Chart of Primary Professions: Shows the top 10 most common professions in the dataset.

Output

The output consists of statistical analysis results and visualizations displayed as figures. Ensure your console or IDE supports graphical display, or use an IPython environment for optimal results.

Contributing

Contributions are welcome! If you have suggestions for improving the script or additional analyses, feel free to fork this project and submit pull requests.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
imdb_analysis.py		imdb_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDb Data Analysis Project

Prerequisites

Running the Script

Workflow

Workflow Diagram

Visualizations

Output

Contributing

License

About

Releases

Packages

Languages

License

dcl4k/imdb_public.py

Folders and files

Latest commit

History

Repository files navigation

IMDb Data Analysis Project

Prerequisites

Running the Script

Workflow

Workflow Diagram

Visualizations

Output

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages