Last month Last.fm contributed a massive set of tag data to the Million Song Data Set. The data set includes:
- 505,216 tracks with at least one tag
- 522,366 unique tags
- 8,598,630 (track – tag) pairs
A popular track like Led Zep’s Stairway to Heaven has dozens of unique tags applied hundreds of times.
There is no end to the number of interesting things you can do with these tags: Track similarity for recommendation and playlisting, faceted browsing of the music space, ground truth for training autotagging systems etc.
I think there’s quite a bit to be learned about music itself by looking at these tags. We live in a post-genre world where most music no longer fits into a nice tidy genre categories. There are hundreds of overlapping subgenres and styles. By looking at how the tags overlap we can get a sense for the structure of the new world of music. I took the set of tags and just looked at how the tags overlapped to get a measure of how often a pair of tags co-occur. Tags that have high co-occurrence represent overlapping genre space. For example, among the 500 thousand tracks the tags that co-occur the most are:
- rap co-occurs with hip hop 100% of the time
- alternative rock co-occurs with rock 76% of the time
- classic rock co-occurs with rock 76% of the time
- hard rock co-occurs with rock 72% of the time
- indie rock co-occurs with indie 71% of the time
- electronica co-occurs with electronic 69% of the time
- indie pop co-occurs with indie 69% of the time
- alternative rock co-occurs with alternative 68% of the time
- heavy metal co-occurs with metal 68% of the time
- alternative co-occurs with rock 67% of the time
- thrash metal co-occurs with metal 67% of the time
- synthpop co-occurs with electronic 66% of the time
- power metal co-occurs with metal 65% of the time
- punk rock co-occurs with punk 64% of the time
- new wave co-occurs with 80s 63% of the time
- emo co-occurs with rock 63% of the time
It is interesting to see how the subgenres like hard rock or synthpop overlaps with the main genre and how all rap overlaps with Hip Hop. Using simple overlap we can also see which tags are the least informative. These are tags that overlap the most with other tags, meaning that they are least descriptive of tags. Some of the least distinctive tags are: Rock, Pop, Alternative, Indie, Electronic and Favorites. So when you tell someone you like ‘rock’ or ‘alternative’ you are not really saying too much about your musical taste.
The Music Matrix
I thought it might be interesting to explore the world of music via overlapping tags, and so I built a little web app called The Music Matrix. The Music Matrix shows the overlapping tags for a tag neighborhood or an artist via a heat map. You can explore the matrix, looking at how tags overlap and listening to songs that fit the tags.
With this app you can enter a genre, style, mood or other type of tag. The app will then find the 24 tags with the highest overlap with the seed and show the confusion matrix. Hotter colors indicate high overlap. Mousing over a cell will show you the percentage overlap between the two corresponding tags and clicking on a cell will play a track that has high tag counts for the two tags. I find that I can learn a lot about a genre of music by looking at the 24 tag neighborhood for a genre and listening to examples. Some interesting neighborhoods to explore are:
You can also explore by moods:
If you are not sure what genre or style is for an artist, you can just start with the top tags for the artist like so:
Use the Music Matrix to explore a new genre of music or to find music that matches a set of styles. Find out how genres overlap. Listen to prototypical examples of different styles. Click on things, have fun. Check it out:
The code for the Music Matrix is on Github. Thanks to Thierry for creating the Million Song Data Set (the best research data set ever created) and thanks to Last.fm for contributing a very nice set of tag data to the data set.