Spotify dataset is downloaded from Kaggle. This database contains information about artists, musical charecteristics, year and popularity of 1.7 L songs of Spotify for past 100 years.
We've tried to quantify relationships between different features and their impact on songs getting popularity.
We've used libraries like pandas, numpy, scikitlearn, matplotlib and seaborn to carry out data preprocessing, data visualization and modelling.
This study has provided us good hands on and insights for:
- Linear Regression
- Logistic Regression
- Decision Tree Classification
- KNN Classification
- random Forest Classification
From models and visuals, it can be definitely said that the popularity is some what biased towards the year of the song. This is unclear how popularity has been measured, wether it was song's popularity in the years of their release or in today's date. It is obvious that recent songs would be more popular among people from 2020.
Future scope is to develop NN model using TensorFlow and will be uploaded soon.