fa22-prj-pcbarko-schen176--mettler3-yc62-gianghl2

This is the repository for the Fall 2022 STAT447 group project for:

Patrick Barko pcbarko
Stephanie Chen schen176
Matt Mettler mettler3
Giang Ha Le gianghl2
Yongxin Cai yc62

Spotify is an audio streaming service used by hundreds of millions of people. For this group project, we sought to model the popularity of songs based on several acoustic and sonic attributes. There are several published Spotify datasets, but these are several years old. Our fist objective was to create a new, updated dataset of Spotify songs. We accomplished this using python and R scripts to generate random song IDs and using these to search the Spotify API. Our second objective was to model the use song popularity (dependent variable) from acoustic attributes (independent variables). Others have attempted to model popularity from the acoustic attributes and genre, but most used linear models that did not perform well. We used alternative approaches to modeling/predicting song popualrity from acoustic attributes and compared them.

We were successful in generating a new spotify database of 579,131 unique songs including various metadata: track ID, artist, a popularity index, and several acoustic/sonic features. Exploratory analysis included principal component analysis and correlation analysis between popularity and acoustic/sonic attributes and among the different acoustic/sonic attributes. We modeled song popularity from the acoustic/sonic attributes using regression and classification models. We found that random forest models performed best with our data and that liner models had comparably poor performance.

In completing this project, our group developed several new capabilities including generating random alphanumeric strings to query the Spotify API, cleaning a very large dataset (scanning for duplicated, etc.), interactive plots and tables using Shiny, and comparing performance of several different models.

Code and results for the final project, the presentation slides are located in the "Final_Project" directory, and intermediate versions are located in the "Old_Analysis" and "Old_Data" directories. The pdf version of the report "Final_Project_Final_Version.pdf" does not contain interactive plots in the exploratory data analysis section. To view the interactive plots, the file "Final_Project_Final_Version.Rmd" needs to be rendered as HTML (this will take a few minutes due to the large dataset).

Our presentation recording can be found in this Zoom sharable link or Mediaspace or Box

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
Final_Project		Final_Project
Old_Analysis		Old_Analysis
Old_Data		Old_Data
.DS_Store		.DS_Store
Group_Assessment.Rmd		Group_Assessment.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fa22-prj-pcbarko-schen176--mettler3-yc62-gianghl2

About

Releases

Packages

Contributors 6

Languages

illinois-stat447/fa22-prj-pcbarko-schen176--mettler3-yc62-gianghl2

Folders and files

Latest commit

History

Repository files navigation

fa22-prj-pcbarko-schen176--mettler3-yc62-gianghl2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages