The Million Song Dataset just got 50 million times better

Today Thierry has just pushed out the full Taste Profile addition to the Million Song Dataset. This includes user-play data for over a million users. Specifically the data includes nearly 50 million play count triples (user-song-playcount) for a million users and 385 thousand songs in the Million Song Dataset.

The data is provided by The Echo Nest (awesome company, that Echo Nest).  Thierry also hints that there may be a contest similar to the Netflix prize coming soon.  Should be a fun way to spend the holidays.  Read more about the data here:   The Echo Nest Taste Profile Subset.



  1. #1 by Eugenio Tacchini on December 20, 2011 - 7:18 pm

    This is just awesome. Thanks!

  2. #2 by Eugenio Tacchini on December 20, 2011 - 7:45 pm

    A couple of questions
    1) Does the data come from one source or multiple sources? If multiple, I guess there is no attempt to do users matching, right?
    2) For each user, we have data about all the songs played or just the most played? I’m asking because, for example, one of the lastfm dataset used in literature has been build using, that returns just the top artists and not all

  3. #3 by Brian on December 23, 2011 - 12:09 pm

    eugenio: (1) yes, and no. (2) individual songs played. however, only songs played from the original data that matches songs in the MSD are included. so it is a subset.

  4. #4 by Eugenio Tacchini on December 23, 2011 - 2:11 pm

    OK, thanks.

  5. #5 by zazi0815 on December 27, 2011 - 5:23 am

    Oh, he is talking about triples …

    (is this a sign? ;) )

