Spotify Tracks 1922-2020

CHECK OUT MY SLIDESHOW PRESENTATION ON MY DATA

NOTE: The data folder is not in this repo due to GitHub's size limit, here is where you can find the tracks.csv file needed for this project to run.

NOTE: The Spotify API auth key is also missing for security purposes. In order to get your own Spotify API key follow these instructions

EDA

A Look at the Data ~ 600k Tracks

ID ◼
Name ◼
Popularity ★
Explicit ♦
Artists (◼,◼)
Release Date ◼
Danceability ♣
Energy ♣
Key ★
Loudness ♣
Speechiness ♣
Acousticness ♣
Instrumentalness ♣
Liveness ♣
Valence ♣
Mode ★
Tempo ♣
Time Signature ★
Duration ♣
Release Year ★

◼ string, ★ int, ♦ boolean, ♣ float,

Key Metrics Used

Name ◼
Popularity ★
Danceability ♣
Duration ♣
Release Year ★

Visualizing the Data

Note: The boxplots exclude outliers

Hypothesis Testing

What Changes The Popularity?

Explicit
Duration

Popularity vs Explicitness

H0: Explicit = Non Explicit
HA: Explicit > Non Explicit
Using a bootstrapping technique we can simulate grabbing multiple means of sample data sets from our sample

95% Confidence Intervals
- Non Explicit: (26.72, 26.74)
- Explicit: (45.68, 45.69)
Since my confidence intervals never overlap, I will reject the null hypothesis. There is enough evidence to show the mean population popularity between explicit songs and non-explicit songs is greater in the explicit songs.

Popularity vs Duration

H0: Popularity of songs with length less than or equal to 5 minutes is equal to the Popularity of songs with length greater than 5 minutes
HA: Popularity of songs with length less than or equal to 5 minutes is greater than the Popularity of songs with length greater than 5 minutes
Using the Central Limit Theorem I will make a normal model and using Welch's T-Test I will calculate my P-value.

At an α level of 0.05 and with a P-value of 1.57e-34 I will Reject the Null Hypothesis. There is enough evidence to show that the population popularity mean for songs less than or equal to 5 min in length is greater than that of songs greater than 5 min in length.

Spotfiy API

Using The Spotify API to Compare Artists

I was able to create a couple helper functions that would simplify the process of comparing two different artists using the Spotify API and a given metric. Through that, we can go ahead an make faster hypothesis tests.

Using the Central Limit Theorem we can conduct Welch's T-Test on the wanted metric and from there receive a P-Value

We can assume our Null and Alternative Hypothesis go as follows:

H₀: ArtistOne_Metric = ArtistTwo_Metric
H_A: ArtistOne_Metric > ArtistTwo_Metric

Below you'll find the doc strings for the functions that make this process simply for you to use.

def GetTwoArtists(artist_one, artist_two, years=None, nofeatures=True, metric='popularity', save=False, nameAppend=""):
    """Get the test for two different artists according to a metric.

    NOTE: CompareArtistsCLT() is called from within this function

    Parameters
    ----------
    artist_one : list<track>
        a list of analysis track objects
    artist_two : list<track>
        a list of analysis track objects
    years : string
        years to look for in spotify data
    nofeatures : boolean
        do we want data with features
    metric : string
        the metric to look for
    save : boolean
        save figure?
    nameAppend : string
        text to append to filename
    Returns
    -------
    artist_one
        artist_one track analysis
    artist_two
        artist_two track analysis
    """

def CompareArtistsCLT(self, artists, metric='popularity', labels=[], save=False, nameAppend=""):
      """Compare artists metrics using central limit theorem and t testing

      Parameters
      ----------
      artists : list<track data>
          artists data to use
      metric : string
          metric to measure
      labels : list<string>
          list of strings to label our normal models
      save : boolean
          save figure?
      nameAppend : string
          text to append to data

      Returns
      -------
      float
          p value  of our t test
      """

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
plots		plots
src		src
.gitignore		.gitignore
NOTES.md		NOTES.md
PROPOSALS.md		PROPOSALS.md
README.md		README.md
TODO.md		TODO.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Tracks 1922-2020

EDA

A Look at the Data ~ 600k Tracks

◼ string, ★ int, ♦ boolean, ♣ float,

Key Metrics Used

Visualizing the Data

Hypothesis Testing

What Changes The Popularity?

Popularity vs Explicitness

Popularity vs Duration

Spotfiy API

Using The Spotify API to Compare Artists

XXX Tentacion vs Juice World (Popularity)

Reject Null Hypothesis

Kanye West vs J Cole (Popularity)

Fail to Reject Null Hypothesis

Bad Bunny vs J Balvin (Danceability)

Fail to Reject Null Hypothesis

About

Releases

Packages

Languages

dannyyy-jimenez/CapstoneOne

Folders and files

Latest commit

History

Repository files navigation

Spotify Tracks 1922-2020

EDA

A Look at the Data ~ 600k Tracks

◼ string, ★ int, ♦ boolean, ♣ float,

Key Metrics Used

Visualizing the Data

Hypothesis Testing

What Changes The Popularity?

Popularity vs Explicitness

Popularity vs Duration

Spotfiy API

Using The Spotify API to Compare Artists

XXX Tentacion vs Juice World (Popularity)

Reject Null Hypothesis

Kanye West vs J Cole (Popularity)

Fail to Reject Null Hypothesis

Bad Bunny vs J Balvin (Danceability)

Fail to Reject Null Hypothesis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages