Final_Blogpost

Gathering the Data¶

Spotify provides an API for developers that allows users to pull data about artists, albums, songs, playlists, and more. With the API, you can look at various "audio features" that Spotify assigns to each song. Here are some of the features Spotify makes available and their descriptions from the Spotify API Documentation:

Danceability: "Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable."
Acousticness: "A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic."
Energy: "Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy."
Instrumentalness: "Predicts whether a track contains no vocals. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content."
Liveness: "Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live."
Loudness: "The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track. Values typical range between -60 and 0 db."
Speechiness: "Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value."
Tempo: "The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration."
Valence: "A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)."

spotify

We thought that it would be interesting to try and predict a song's genre based off of these audio features. Unfortunately, Spotify's API does not allow you to pull the genre of a song directly. Rather, they provide a "genre seed:" an array of genres associated with the song used for the recommendation function of the API. To work around this, we used the API to search for the top 1000 songs in a given genre, pull the audio features for each song, and add on the genre label.

After putting the data into a dataframe, there were 6,000 rows representing 1,000 songs from the genres of pop, rock, country, EDM, rap, and classical. After removing duplicates, 5,381 songs remained. Notably, most duplicates were from overlap between pop and other genres, such as EDM or rap. This indicates that it may be harder to classify this genre.

Exploratory Data Analysis¶

Before training our genre classifier, we felt it would be helpful to look at the correlation between each of our features to distinguish which would be useful in making our predictions.

corr

From the correlation heatmap, we find that acousticness and instrumentalness are highly correlated; popularity, danceability, energy, and loudness are also correlated. These four features are also distinctly negatively correlated with the two mentioned before.

To continue our analysis, we converted each of these features into z-scores and grouped the data by genre, finding the mean z-score for each one. Then, to interpret the genres relative to one another, we plotted the difference in z-score between each genre and one baseline genre. Here, the baseline selected was rock.

zdiff

Our principal observations:

Classical music has the most differences between the rest of the genres by far-- the magnitude of its bars tells that much.
In contrast, the small bars of country mean it will likely be difficult to differentiate rock and country. It looks like duration or acousticness will be the most useful in telling them apart.
Rap is set apart from every other genre by its high speechiness and danceability.
Pop's defining traits appear to be danceability and speechiness as well, though not to the same extent as rap. It also has the highest popularity of the genres.
EDM has the highest instrumentalness after classical music, the highest loudness and energy, and much lower popularity than the other genres.
Other differences are not quite as obvious: we must keep these observations in mind while training the classifier.

As we saw when removing duplicate songs from our classifier, many pop songs overlapped with other genres. We decided to create the same plot for differences between pop to gain insight into this.

zdiff2

Let's see which features may be distinguishable:

Pop has the highest popularity, but rap is only slightly lower.
Except for country, the mode (major/minor) of pop also does not differ much from the others.
Pop's speechiness is rather high, except when compared to rap.
Acousticness is slightly higher than the others, but not significantly.
Liveness appears to be on the lower end, but not by much.
Tempo is slightly lower, and duration slightly higher (with the exception of rock). The others either vary minimally, or pop lands in the middle: lower than some, higher than others. Pop plants itself quite firmly in between the other genres in nearly all features, making it difficult to identify. Due to this, we decided to exclude pop from our classifier.

Creating the Classifier¶

Our goal was to create a classifier that could identify a song as one of five genres (rock, country, EDM, rap, classical) based off of the song's audio characteristics. (Pop was left out because of its significant overlap with other genres.) Exploratory analysis suggested that songs in different genres have distinguishing audio characteristics that would allow a classifier to correctly identify a song's most probable genre. This is a multi-class classification problem with the following setup:

Setup¶

Task: Accurately classify a song into one of five broad genres

Labels: The ground-truth genre of a given song

Features: Danceability, Energy, Loudness, Speechiness, Acousticness, Instrumentalness, Liveness, Valence, Tempo, Duration

Initial Testing¶

After splitting the data into its features and labels, scaling the data, and creating training, validation, and test sets, we were able to begin testing various classification models on our dataset. The classifier has parameters set to their default values to see if one classifier performed significantly better than the others from the start. This would allow us to choose the best default classifier and tune its hyperparameters to create the best possible predictions.

Here were our initial results:

Model	Validation Set Accuracy	Validation Set F1 Score
Random Forest	0.761	0.760
Neural Network	0.753	0.752
Linear SVM	0.741	0.741
Logistic Regression	0.722	0.720
K-Nearest-Neighbors	0.667	0.666
Decision Tree	0.654	0.653
Naive Bayes	0.610	0.589
BASELINE: Guess the Mode	0.2312	0.087
BASELINE: Random Guess	0.208	0.209

From the table, we see that the Random Forest and Neural Network classifiers perform best in the accuracy and F1 score metrics. This means that Random Forest could classify 76.1% of songs into their correct genres. To look at how the classifier performs on each genre, we can look at the confusion matrix.

cm1

We see that the classifier performs best on the classical and rap genres. However it confususes country and rock fairly frequently. This is likely because these genres have very similar audio feature scores as seen in the exploratory analysis.

Now, we will take the Random Forest Classifier and see if we can tune its hyperparameters to achieve better results.

Hyperparameter Tuning¶

The Random Forest Classifier has 18 different hyperparameters that control various aspects of how the model makes its predictions. We can take a subset of these hyperparameters and test how the model performs on various combinations of different hyperparameter values. The functions GridSearchCV and RandomizedSearchCV were used to test 68 combinations in total to arrive at the model that had the highest average mean accuracy score over three cross validation runs.

The following are hyperparameters that were tested, as well as the combination that produced the best results:

Number of Esitmators = 1000
Max Depth = 90
Max Features = 1
Min Samples of Split = 10
Min Samples of Leaf = 2

Final Performance¶

Validation Set : 77.1% Accuracy
Test Set : 72.6% Accuracy

After hyperparameter tuning, we achieved a score of 77.1% accuracy on the validation set. While there was only a 1.1% increase over the generic model, improvement is evident and celebrated. The model scored slighly lower on the test set indicating some overfitting on the training and validation set. The confusion matrix for the final model also shows that it slightly improved its predictions for the rock and country categories.

cm2

Further Analysis: For details of our full analysis, please see the linked jupyter notebook here.

Creating a Genre Classifier using Spotify's API¶

Madeline Carter, Ethan Crow, Alex Isbill¶