Edith L.M. Law has just released the long-awaited Tagatune Dataset.
From the README:
The Tagatune dataset consist of 31383 music clips that are 29 seconds long, created from songs downloadable from Magnatune.com. The genres include classical, new age, electronica, rock, pop, world, jazz, blues, metal, punk etc. The dataset is optimized for training machine learning algorithms — i.e. it includes tags that are associated with more than fifty songs, and each song is associated with a tag only if that tag has been generated by more than two players independently.
The data is collected from a two-player online game called Tagatune, deployed on the GWAP.com game portal. In this game, two players are given either the same song or different songs, and are asked to enter descriptions appropriate for their given song. After reviewing each other’s descriptions, the players then guess whether they are given the same song or not.
This is great data, useful for all sorts of things, especially research around autotagging and query-by-description. It is quite complimentary to a dataset that we are about to release from the Echo Nest (stay tuned for that).