This project analyzes the "Individual Household Electric Power Consumption" dataset from the UCI Machine Learning Repository. The analysis involves statistical methods, hypothesis testing, regression, clustering, and dimensionality reduction techniques to gain insights into household electricity consumption patterns.
- Source: UCI Machine Learning Repository
- Timeframe: 4 years of electric power consumption data
- Attributes: Various electrical parameters such as active power, reactive power, voltage, and current
- Population Sampling & Hypothesis Testing:
- Creating a normal population from the dataset
- Extracting samples and comparing variances between attributes
- Regression Analysis:
- Identifying linear relationships between variables
- Dimensionality Reduction & Clustering:
- Applying Principal Component Analysis (PCA)
- Implementing clustering techniques for better data interpretation
- Analysis of Variance (ANOVA):
- Comparing means of specific characteristics across different groups
- R Programming Language
- Statistical Libraries
- Machine Learning Techniques
-
Clone the repository:
git clone https://github.com/lorainemg/Household-Analysis.git
-
Open the project in RStudio or your preferred R environment.
-
Run the scripts in the specified order to reproduce the analysis.
Documentation about Phase1 and Phase2 of this project can be found at phase1-report and phase2-report
This project is open-source and available under the MIT License.