This repository contains the run_analyis.R script which takes raw data from "./UCI HAR Dataset" directory, perform 5 cleanup tasks to the raw data and return a tidy dataframe as a result. Below are the detailed descriptions of the expected input, cleanup tasks and the output:
##Expected input: "./UCI HAR Dataset" should be present in the working directory
- activity_labels.txt - Text label for each activity id
- features.txt - Text label for each sensor output
####./UCI HAR Dataset/test - test-group data-set
- subject_test.txt - test-group subject-id linked to X_test.txt data
- y_test.txt - test-group activity-id linked to X_test.txt data
- X_test.txt - sensors data for test-group
####./UCI HAR Dataset/train - train-group data-set
- subject_train.txt - train-group subject-id linked to X_train.txt data
- y_train.txt - train-group activity-id linked to X_train.txt data
- X_train.txt - sensors data for train-group
run_analysis() perform the following cleanup tasks to the raw dataset
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- return a tidy dataframe from step 5
- it also writes this tidy dataframe to a file named "har_tidy.data.txt"
The result tidy dataframe has:
- 180 rows (observations) for each of the SUBJECT.ID (30) x ACTIVITY.ID (6) combination.
- 79 columns variables: Averages of the mean and standard deviation for each measurement