Like most young adults our age, we are avid music listeners. Madeline knows most, if not all of The Hot 100; Ethan boogies down to rock and J-pop; Alex taps his cowboy boots to the sweet melodies of good ol' country. Clearly, our tastes in music are wildly varied and we began to wonder how genre of music could be differentiated analytically. We grew fascinated by the idea of a machine learning model that could predict the genre of a song, and discovered a way to accomplish this using on everyone's favorite music streaming platform: Spotify.
Spotify provides an API for developers that allows users to pull data about artists, albums, songs, playlists, and more. With the API, you can look at various "audio features" that Spotify assigns to each song. Here are some of the features Spotify makes available and their descriptions from the Spotify API Documentation:
We thought that it would be interesting to try and predict a song's genre based off of these audio features. Unfortunately, Spotify's API does not allow you to pull the genre of a song directly. Rather, they provide a "genre seed:" an array of genres associated with the song used for the recommendation function of the API. To work around this, we used the API to search for the top 1000 songs in a given genre, pull the audio features for each song, and add on the genre label.
After putting the data into a dataframe, there were 6,000 rows representing 1,000 songs from the genres of pop, rock, country, EDM, rap, and classical. After removing duplicates, 5,381 songs remained. Notably, most duplicates were from overlap between pop and other genres, such as EDM or rap. This indicates that it may be harder to classify this genre.
Before training our genre classifier, we felt it would be helpful to look at the correlation between each of our features to distinguish which would be useful in making our predictions.
From the correlation heatmap, we find that acousticness and instrumentalness are highly correlated; popularity, danceability, energy, and loudness are also correlated. These four features are also distinctly negatively correlated with the two mentioned before.
To continue our analysis, we converted each of these features into z-scores and grouped the data by genre, finding the mean z-score for each one. Then, to interpret the genres relative to one another, we plotted the difference in z-score between each genre and one baseline genre. Here, the baseline selected was rock.
Our principal observations:
As we saw when removing duplicate songs from our classifier, many pop songs overlapped with other genres. We decided to create the same plot for differences between pop to gain insight into this.
Let's see which features may be distinguishable:
Our goal was to create a classifier that could identify a song as one of five genres (rock, country, EDM, rap, classical) based off of the song's audio characteristics. (Pop was left out because of its significant overlap with other genres.) Exploratory analysis suggested that songs in different genres have distinguishing audio characteristics that would allow a classifier to correctly identify a song's most probable genre. This is a multi-class classification problem with the following setup:
Task: Accurately classify a song into one of five broad genres
Labels: The ground-truth genre of a given song
Features: Danceability, Energy, Loudness, Speechiness, Acousticness, Instrumentalness, Liveness, Valence, Tempo, Duration
After splitting the data into its features and labels, scaling the data, and creating training, validation, and test sets, we were able to begin testing various classification models on our dataset. The classifier has parameters set to their default values to see if one classifier performed significantly better than the others from the start. This would allow us to choose the best default classifier and tune its hyperparameters to create the best possible predictions.
Here were our initial results:
Model | Validation Set Accuracy | Validation Set F1 Score |
---|---|---|
Random Forest | 0.761 | 0.760 |
Neural Network | 0.753 | 0.752 |
Linear SVM | 0.741 | 0.741 |
Logistic Regression | 0.722 | 0.720 |
K-Nearest-Neighbors | 0.667 | 0.666 |
Decision Tree | 0.654 | 0.653 |
Naive Bayes | 0.610 | 0.589 |
BASELINE: Guess the Mode | 0.2312 | 0.087 |
BASELINE: Random Guess | 0.208 | 0.209 |
From the table, we see that the Random Forest and Neural Network classifiers perform best in the accuracy and F1 score metrics. This means that Random Forest could classify 76.1% of songs into their correct genres. To look at how the classifier performs on each genre, we can look at the confusion matrix.
We see that the classifier performs best on the classical and rap genres. However it confususes country and rock fairly frequently. This is likely because these genres have very similar audio feature scores as seen in the exploratory analysis.
Now, we will take the Random Forest Classifier and see if we can tune its hyperparameters to achieve better results.
The Random Forest Classifier has 18 different hyperparameters that control various aspects of how the model makes its predictions. We can take a subset of these hyperparameters and test how the model performs on various combinations of different hyperparameter values. The functions GridSearchCV
and RandomizedSearchCV
were used to test 68 combinations in total to arrive at the model that had the highest average mean accuracy score over three cross validation runs.
The following are hyperparameters that were tested, as well as the combination that produced the best results:
After hyperparameter tuning, we achieved a score of 77.1% accuracy on the validation set. While there was only a 1.1% increase over the generic model, improvement is evident and celebrated. The model scored slighly lower on the test set indicating some overfitting on the training and validation set. The confusion matrix for the final model also shows that it slightly improved its predictions for the rock and country categories.
Further Analysis: For details of our full analysis, please see the linked jupyter notebook here.