Posts Tagged

The Music Matrix – Exploring tags in the Million Song Dataset

Last month contributed  a massive set of tag data to the Million Song Data Set. The data set includes:

  • 505,216 tracks with at least one tag
  • 522,366 unique tags
  • 8,598,630 (track – tag) pairs

A popular track like Led Zep’s Stairway to Heaven has dozens of unique tags applied hundreds of times.

There is no end to the number of interesting things you can do with these tags: Track similarity for recommendation and playlisting, faceted browsing of the music space, ground truth for training autotagging systems etc.

I think there’s quite a bit  to be learned about music itself by looking at these tags.  We live in a post-genre world where most music no longer fits into a nice tidy genre categories.  There are hundreds of overlapping subgenres and styles.  By looking at how the tags overlap we can get a sense for the structure of the new world of music.     I took the set of tags and just looked at how the tags overlapped to get a measure of how often a pair of tags co-occur.  Tags that have high co-occurrence represent overlapping genre space.   For example, among the 500 thousand tracks the tags that co-occur the most are:

  • rap co-occurs with hip hop 100% of the time
  • alternative rock co-occurs with rock 76% of the time
  • classic rock co-occurs with rock 76% of the time
  • hard rock co-occurs with rock 72% of the time
  • indie rock co-occurs with indie 71% of the time
  • electronica co-occurs with electronic 69% of the time
  • indie pop co-occurs with indie 69% of the time
  • alternative rock co-occurs with alternative 68% of the time
  • heavy metal co-occurs with metal 68% of the time
  • alternative co-occurs with rock 67% of the time
  • thrash metal co-occurs with metal 67% of the time
  • synthpop co-occurs with electronic 66% of the time
  • power metal co-occurs with metal 65% of the time
  • punk rock co-occurs with punk 64% of the time
  • new wave co-occurs with 80s 63% of the time
  • emo co-occurs with rock 63% of the time

It is interesting to see how the subgenres like hard rock or synthpop overlaps with the main genre and how all rap overlaps with Hip Hop.   Using simple overlap we can also see which tags are the least informative. These are tags that overlap the most with other tags, meaning that they are least descriptive of tags.  Some of the least distinctive tags are: Rock, Pop, Alternative, Indie, Electronic and Favorites.  So when you tell someone you like ‘rock’  or ‘alternative’ you are not really saying too much about your musical taste.

The Music Matrix

I thought it might be interesting to explore the world of music via overlapping tags, and so I built a little web app called The Music Matrix. The Music Matrix shows the overlapping tags for a tag neighborhood or an artist via a heat map. You can explore the matrix, looking at how tags overlap and listening to songs that fit the tags.

With this app you can enter a genre, style, mood or other type of tag.  The app will then find the 24 tags with the highest overlap with the seed and show the confusion matrix.  Hotter colors indicate high overlap.    Mousing over a cell will show you the percentage overlap between the two corresponding tags and clicking on a cell will play a track that has high tag counts for the two tags.   I find that I can learn a lot about a genre of music by looking at the 24 tag neighborhood for a genre and listening to examples. Some interesting neighborhoods to explore are:

You can also explore by moods:

And other facets:

If you are not sure what genre or style is for an artist, you can just start with the top tags for the artist like so:

Use the Music Matrix to explore a new genre of music or to find music that matches a set of styles.  Find out how genres overlap. Listen to prototypical examples of different styles. Click on things, have fun.  Check it out:

The Music Matrix

The code for the Music Matrix is on Github.  Thanks to Thierry for creating the Million Song Data Set  (the best research data set ever created) and thanks to for contributing a very nice set of tag data to the data set.

, ,



A few years back I created a data set of social tags from RJ at graciously gave permission for me to distribute the dataset for research use.  I hosted the dataset on the media server at Sun Labs. However, with the Oracle acquisition, the media server is no longer serving up the data, so I thought I would post the data elsewhere.

The dataset is now available for download here: Lastfm-ArtistTags2007

Here are the details as told in the README file:

The LastFM-ArtistTags2007 Data set
Version 1.0
June 2008

What is this?

    This is a set of artist tag data collected from using
    the Audioscrobbler webservice during the spring of 2007.

    The data consists of the raw tag counts for the 100 most
    frequently occuring tags that listeners have applied
    to over 20,000 artists.

    An undocumented (and deprecated) option of the audioscrobbler
    web service was used to bypass the normalization of tag
    counts.  This data set provides raw tag counts.

Data Format:

  The data is formatted one entry per line as follows:



    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art punk<sep>21
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art rock<sep>18

Data Statistics:

    Total Lines:      952810
    Unique Artists:    20907
    Unique Tags:      100784
    Total Tags:      7178442


    Some minor filtering has been applied to the tag data. will
    report tag with counts of zero or less on occasion. These tags have
    been removed.

    Artists with no tags have not been included in this data set.
    Of the nearly quarter million artists that were inspected, 20,907
    artists had 1 or more tags.


    ArtistTags.dat  - the tag data
    README.txt      - this file
    artists.txt     - artists ordered by tag count
    tags.txt        - tags ordered by tag count


    The data in LastFM-ArtistTags2007 is distributed with permission of  The data is made available for non-commercial use only under
    the Creative Commons Attribution-NonCommercial-ShareAlike UK License.
    Those interested in using the data or web services in a commercial
    context should contact partners at last dot fm. For more information


    Thanks to for providing the access to this tag data via their
    web services


    This data was collected, filtered and by Paul Lamere of The Echo Nest. Send
    questions or comments to



1 Comment

Last.FM’s Listening clock

Nifty new visualization at that shows the time of day when  you listen to music:

Leave a comment

MeToo – a scrobbler for the room

[tweetmeme source= ‘plamere’ only_single=false] One of the many cool things about working at the Echo Nest is that we have  an Sonos audio system with  single group playlist for the office. Anyone from the CEO to the greenest intern can add music to the listening queue for everyone to listen to. The office, as a whole has a rather diverse taste in music and as a result I’ve been exposed to lots of interesting music.   However, the downside of this is that since I’m not listening to music being played on my personal computer, every day I have 10 hours of music listening that is never scrobbled, and as they say, if it doesn’t scrobble, it doesn’t count.   Sure the Sonos system scrobbles all of the plays  to the Echo Nest account on but I’d also like it to scrobble it to my account so I can use nifty apps like  Lee Byron’s Listening History or  Matt Ogle’s Bragging Rights on my own scrobbles.

This morning while listening to that nifty Emeralds album,  I decided that I’d deal with those scrobble gaps once and for all.  So I wrote a little python script called MeToo that keeps my scrobbles up to date.  It’s really quite simple. Whenever I’m in the office, I fire up MeToo.  MeToo watches the most recent tracks played on The Echo Nest account and whenever a new track is played, it scrobbles it to my personal account. In effect, my scrobbles will track the office scrobbles.  When I’m not listening I just close my laptop and the scrobbling stops.

The script itself is pretty simple – I used pylast to do interfacing to –  the bulk of the logic is less than 20 lines of code.   I start the script like so:

% python TheEchoNest lamere

when I do that, MeToo will continuously monitor most recently played tracks on TheEchoNest and scrobble the plays on my account. When I close my laptop, the script is naturally suspended – so even though music may continue to play in the office, my laptop won’t scrobble it.

My scrobbles and Echo Nest scrobbles

I suspect that this use case is relatively rare, and so there’s probably not a big demand for something like MeToo, but if you are interested in it, leave a comment. If I see some interest, I’ll toss it up on google code so anyone can use it.

It feels great to be scrobbling again!



Which band has the hotttnesss?

Developer/musician Paul Barrett (aka echodeck) has created pop.ularity a nifty web-based music quiz based on and the Echo Nest APIs.  In the quiz you try to guess which band is hotter on the web. The quiz uses plays, listeners, Echo Nest Hottttnesss and Echo Nest familiarity to measure popularity for each band.

It’s a fun game – give it a whirl!

, , ,


Unofficial Artist Guide to SXSW

I’m excited! Next week I travel to Austin for a week long computer+music geek-fest at SXSW.  A big part of SXSW is the music – there are nearly 2,000 different artists playing at SXSW this year. But that presents a problem – there are so many bands going to SXSW (many I’ve never heard of) that I find it very hard to figure out which bands I should go and see.  I need a tool to help me find sift through all of the artists – a tool that will help me decide which artists I should add to my schedule and which ones I should skip.   I’m not the only one who was daunted by the large artist list.  Taylor McKnight, founder of SCHED*, was thinking the same thing.  He wanted to give his users a better way to plan their time at SXSW.  And so over a couple of weekends Taylor built (with a little backend support from us)  The Unofficial Artist Discovery Guide to SXSW.

The Unofficial Artist Discovery Guide to SXSW is a tool that allows you to explore the many artists attending this year’s SXSW.  It lets you search for artists,  browse popularity, music style, ‘buzzworthiness’,  or similarity to your favorite artists – and it will make recommendations for you based on your music taste (using your, Sched* or Hype Machine accounts) .  The Artist Guide supplies enough context (bios, images, music, tag clouds, links) to help you decide if you might like an artist.

Here’s the guide:

Here’s a quick tour of some of the things you can do with the guide.  First off, you can Search for artists by name, genre/tag or location. This helps you find music when you know what you are looking for.

However, you may not always be sure what you are looking for – that’s where you use Discover. This gives you recommendations based on the music you already like.  Type in the name of a few artists (even artists that are not playing at SXSW) or your SCHED*, Hype Machine or user name, and ‘Discover’ will give you a set of recommendations for SXSW artists based on your music taste.  For example, I’ve been listening to Charlotte Gainsbourg lately so I can use the artist guide to help me find SXSW artists that I might like:

If I see an artist that looks interesting I can drill down and get more info about the artist:

From here I can read the artist bio, listen to some audio, explore other similar SXSW artists or add the event to my SCHED* schedule.

I use quite a bit, so I can enter my name and get SXSW recommendations based upon my top artists. The artist guide tries to mix things up a little bit so if I don’t like the recommendations I see, I can just ask again and I can get a different set. Here are some recommendations based on my recent listening at

If you’ve been using the wonderful SCHED* to keep track of your SXSW calendar you can use the guide to get recommendations based on artists that you’ve already added to your SXSW calendar.

In addition to search and discovery, the guide gives you a number of different ways to browse the SXSW Artist space.  You can browse by ‘buzzworthy’ artists – these are artists that are getting the most buzz on the web:

Or the most well-known artists:

You can browse by the style of music via a tag cloud:

And by venue:

Building the guide was pretty straightforward. Taylor used the Echo Nest APIs to get the detailed artist data such as familiarity, popularity, artist bios, links, images, tags and audio. The only data that was not available at the Echo Nest was the venue and schedule info which was  provided by Arkadiy (one of Taylor’s colleagues).  Even though SXSW artists can be extremely long tail (some don’t even have Myspace pages),  the Echo Nest was able to provide really good coverage for these sets (There was coverage for over 95% of the artists).     Still there are a few gaps and I suspect there may be a few errors in the data (my favorite wrong image is for the band Abe Vigoda).   If you are in a band that is going to SXSW and you see that we have some of your info wrong, send me an email ( and I’ll make it right.

We are excited to see the this Artist Discovery guide built on top of the Echo Nest.  It’s a great showcase for the Echo Nest developer platform and working with Taylor was great.  He’s one of these hyper-creative, energetic types – smart, gets things done and full of new ideas.   Taylor may be adding  a few more features to the guide before SXSW, so stay tuned and we’ll keep you posted on new developments.

, , , , , ,


LastHistory – Visualizing Listening Histories

This week Klaas, one of the researchers at released to the playground the ability to plot data from your personal listening history.  (read about it here: Now in the Playground: Scrobbling Timelines).

You can look at when you started to listen to particular bands, or even compare your listening to one of your friends (here you can see my cumulative listening as compared to my good friend Neil Gaiman.  It’s a really neat app that highlights the awesome listening data that has been collecting for the last 6 or so years.

With the new plots you can look at your listening history – but there’s a new app that takes this idea one step further.    LastHistory, an application by Frederik Seiffert and Dominikus Baur from the Media Informatics Group of the University of Munich  allows you to analyze music listening histories from through an interactive visualization and to explore your own past by combining the music you listened to with your own photos and calendar entries.  Like  Klaas’s scrobbling graphs, LastHistory lets you browse music listening history, but LastHistory goes beyond that – it lets you interact with the visualization, allowing you to use your listening history for music exploration, and playlisting.  And since the listening history can be any listener, it is a great vehicle for music discovery too. The video makes it all really clear:

The integration with your iPhoto library is genius. While you listen to the music  that you played in the car on that road trip to Tennessee in 2oo8 you can see a slide show of your photos from  that same trip.

LastHistory runs on a Mac. When you run it for the first time, you tell it your name. It then goes to to collect your listening history and info about all of the tracks.  (This can take a few minutes depending on how long you’ve been listening at But even while it is retrieving your data you can start to interact with the data.   And interacting with this application is very fun.

Each dot on the display represents a single song play at a point in the past.  Mouse over the point to see the song name and to see other times when you played the song.  Click on the song to hear it.  The dots are colored by the genre (discovered by using the tags applied to the song).  It is quite fun exploring my own listening history. Here’s the time when I first got the Weezer ‘Red’ Album:

This app is cool in so many ways, I know that I’m going to spend  a lot of time playing with this app.  But ff you try it out, remember that it is a 1.0 version. I did experience a crash or two, but it seemed to pick up where it left off without trouble.  Oh yes, one more thing that moves this app from totally cool into über-cool is that it is all open source.  Get the code here:  LastHistory on Github. Congrats to Frederik and Dominikus for creating the first novel music exploration app of the decade.  Nice job!



Normalisr – Time-based charts of your data

Worth checking out: Normalisr



Why I love

Search –


Genre of the week: whalecore

Saw this post by Nackster on the brutal death metal forum

Brutal death metal music - Listen free at

whalecore!  Oh Yeah! Here is it:

And don’t forget this whalecore classic – really, it started the whole genre:

Top whalecore bands are: Gojira Mastodon Ahab Giant Squid Yep, there’s a Wikipedia page on whalecore.  Listen to Whalecore at

, ,