Skip to content

duongtruongbinh/IntroDS_FinalProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Final Project - Introduction to Data Science Course

Overview

This project is the final assignment for the "Introduction to Data Science" course. The objective is to demonstrate the application of data science principles to a real-world dataset. Our chosen domain is comics, and the data has been crawled from MyAnimeList, a popular platform for anime and manga enthusiasts.

Course Details

  • University: University of Science - VNUHCM
  • Course: CSC14119 - Introduction to Data Science
  • Duration: 16 November 2023 - 14 December 2023

Instructors:

  • Nguyen Thi Thu Hang
  • Nguyen Bao Long
  • Le Duc Thanh
  • Nguyen Ngoc Thao

Contributors:

Student ID Full Name
21127104 Doan Ngoc Mai
21127129 Le Nguyen Kieu Oanh
21127229 Duong Truong Binh
21127616 Le Phuoc Quang Huy

Project Structure

The project is organized into four Jupyter Notebook files, each representing a distinct step in the data science workflow:

  1. Data_Collecting.ipynb

    • This notebook contains the code for web scraping the data from MyAnimeList.
    • It outlines the process of extracting and storing the data in a structured format for further analysis.
  2. Data_Exploration_and_PreProcessing.ipynb

    • This notebook focuses on exploring the collected data to understand its structure and content.
    • It includes data cleaning, handling missing values, and preprocessing steps to prepare the data for analysis.
  3. Asking_Questions_and_Analyzing.ipynb

    • In this notebook, we define key questions and hypotheses about the data.
    • It includes visualizations and statistical analyses to extract insights and answer the formulated questions.
  4. Data_Modelling.ipynb

    • This notebook involves building and evaluating machine learning models based on the prepared dataset.
    • The models aim to predict or classify based on specific features of the comic data.

Data Source

The dataset used in this project was collected via web scraping from MyAnimeList. The data includes various attributes such as:

  • Titles
  • Genres
  • Ratings
  • Popularity
  • Other metadata related to comics and manga.

Requirements

To run the project, ensure the following libraries and tools are installed:

  • Python 3.x
  • Jupyter Notebook
  • Libraries:
    • numpy
    • pandas
    • matplotlib
    • seaborn
    • scikit-learn
    • beautifulsoup4
    • requests

How to Use

  1. Clone this repository to your local machine.
  2. Install the required Python libraries.
  3. Open each notebook in Jupyter Notebook.
  4. Follow the order of the notebooks to reproduce the results.

About

Final project of Introduction to Data Science course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •