AutoMPG_DataProcessing

Object-Oriented Data Processing Framework for the UCI AutoMPG Dataset

Project Overview

Date: 23 Jan. 2024 - 31 Jan. 2024
Author: Michelle Zong

This project involves creating an object-oriented framework in Python to process and analyze fuel efficiency data from the UCI AutoMPG dataset. The focus of the project is to apply object-oriented programming (OOP) principles to clean, transform, and analyze the dataset efficiently.

Key Features

Object-Oriented Design: The project is structured around Python classes that encapsulate the data processing logic and methods for efficient data manipulation.
Data Parsing and Cleaning: Handles non-standard delimiters and missing values in the dataset. Implements flexible data ingestion and transformation methods.
Iterable Data Structures: Built custom iterable data structures to process the dataset in a memory-efficient manner, allowing for easy exploration and analysis of the dataset.
Data Comparison: Allows data comparison between two AutoMPG objects such as equality and less than comparisons based on vehicle features (e.g., year, mpg).
Unit Testing with unittest: The file uses Python's unittest framework to test the functionality of the AutoMPG and AutoMPGData classes.
- Methods like setUpClass, tearDownClass, setUp, and tearDown are used for setup and cleanup before and after tests.
- Includes checks for string formatting, equality comparison, less-than ordering, and hashability.

Dataset

Source: UCI Machine Learning Repository - AutoMPG Dataset
Description: This dataset contains various characteristics of different cars from the 1970s and 1980s, including attributes such as MPG (miles per gallon), cylinders, horsepower, weight, etc.
Handling Missing Data: The dataset includes missing values in some columns (e.g., horsepower). These values were cleaned using techniques such as imputation or removal based on analysis requirements.

Project Structure

├── README.md                 # Project documentation\
├── data/                     # Raw and processed datasets\
│   ├── auto-mpg.data.txt     # Original dataset\
│   └── auto-mpg.clean.txt    # Standardized dataset\
├── src/                      # Source code for data processing\
│   ├── autompg.py            # Main script to run the analysis\
│   └── __init__.py\
└── tests/                    # Unit tests for the framework\
    └── test_mpg.py           # Test cases for MPG data processing\

Classes

1. AutoMPG

Responsibilities: represents make, model, year, and mpg attributes for each record in the data set.
Methods:
- __repr__(): Outputs an unambiguous representation of the class object; for developers.
- __str__(): Human-readable representation of class object; for end-users.
- __eq__(): Evaluates whether 2 AutoMPG instances are equivalent based on make, model, year, and mpg
- __lt__(): Evaluates whether an AutoMPG instance is less than the other based on make, model, year, and mpg
- __hash__(): Makes class hashable again after defining eq method

2. AutoMPGData

Responsibilities: An iterable class that will read in the data file, clean it, and generate a list of AutoMPG objects.
Methods:
- __iter__(): makes class iterable
- _load_data(): Reads in the cleaned data file, and populates self.data with AutoMPG objects.
- _clean_data(): Reads in messy data, standardizes data rows and stores the cleaned data in auto_mpg.clean.txt.

How to Run

Fork the Repository:

git fork https://github.com/mmzong/auto-mpg-data-processing.git

Install Requirements:

pip install -r <path/to/requirements.txt>

Run the Project:
```
 python src/autompg.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoMPG_DataProcessing

Project Overview

Key Features

Dataset

Project Structure

Classes

1. AutoMPG

2. AutoMPGData

How to Run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
src		src
tests		tests
README.md		README.md
requirements.txt		requirements.txt

mmzong/AutoMPG_DataProcessing

Folders and files

Latest commit

History

Repository files navigation

AutoMPG_DataProcessing

Project Overview

Key Features

Dataset

Project Structure

Classes

1. AutoMPG

2. AutoMPGData

How to Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages