Archive for category research

How to process a million songs in 20 minutes

The recently released Million Song Dataset (MSD), a  collaborative project between The Echo Nest and Columbia’s LabROSA is a fantastic resource for music researchers. It contains detailed acoustic and contextual data for a million songs. However, getting started with the dataset can be a bit daunting. First of all, the dataset is huge (around 300 gb) which is more than most people want to download.  Second, it is such a big dataset that processing it in a traditional fashion, one track at a time, is going to take a long time.  Even if you can process a track in 100 milliseconds, it is still going to take over a day to process all of the tracks in the dataset.  Luckily there are some techniques such as Map/Reduce that make processing big data scalable over multiple CPUs.  In this post I shall describe how we can use Amazon’s Elastic Map Reduce to easily process the million song dataset.

The Problem

From 'Creating Music by Listening' by Tristan Jehan

For this first experiment in processing the million song data set I want to do something fairly simple and yet still interesting. One easy calculation is to determine each song’s density - where the density is defined as the average number of notes or atomic sounds (called segments) per second in a song.  To calculate the density we just divide the number of segments in a song by the song’s duration.   The set of segments for a track is already calculated in the MSD. An onset detector is used to identify atomic units of sound such as individual notes, chords, drum sounds, etc.  Each segment represents a rich and complex and usually short polyphonic sound. In the above graph the audio signal (in blue) is divided into about 18 segments (marked by the red lines).  The resulting segments vary in duration.  We should expect that high density songs will have lots of activity (as an Emperor once said “too many notes”), while low density songs won’t have very much going on.   For this experiment I’ll calculate the density of all 1 million songs and find the most dense and the least dense songs.

MapReduce
A traditional approach to processing a set of tracks would be to iterate through each track, process the track, and report the result. This approach, although simple, will not scale very well as the number of tracks or the complexity of the per track calculation increases.  Luckily, a number of scalable programming models have emerged in the last decade to make tackling this type of problem more tractable. One such approach is MapReduce.

MapReduce is a programming model developed by researchers at Google  for processing and generating large data sets. With MapReduce you  specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.  There are a number of implementations of MapReduce including the popular open sourced Hadoop and Amazon’s Elastic MapReduce.

There’s a nifty MapReduce Python library developed by the folks at Yelp called mrjob.  With mrjob you can write a MapReduce task in Python and run it as a standalone app while you test and debug it. When your mrjob is ready, you can then launch it on a Hadoop cluster (if you have one), or run the job on 10s or even 100s of CPUs using Amazon’s Elastic MapReduce.   Writing an mrjob MapReduce task couldn’t be easier.  Here’s the classic word counter example written with mrjob:

from mrjob.job import MRJob

class MRWordCounter(MRJob):
    def mapper(self, key, line):
        for word in line.split():
            yield word, 1

    def reducer(self, word, occurrences):
        yield word, sum(occurrences)

if __name__ == '__main__':
    MRWordCounter.run()

The input is presented to the mapper function, one line at a time. The mapper breaks the line into a set of words and emits a word count of 1 for each word that it finds.  The reducer is called with a list of the emitted counts for each word, it sums up the counts and emits them.

When you run your job in standalone mode, it runs in a single thread, but when you run it on Hadoop or Amazon (which you can do by adding a few command-line switches), the job is spread out over all of the available CPUs.

MapReduce job to calculate density
We can calculate the density of each track with this very simple mrjob – in fact, we don’t even need a reducer step:

class MRDensity(MRJob):
    """ A  map-reduce job that calculates the density """

    def mapper(self, _, line):
        """ The mapper loads a track and yields its density """
        t = track.load_track(line)
        if t:
            if t['tempo'] > 0:
                density = len(t['segments']) / t['duration']
                yield (t['artist_name'], t['title'], t['song_id']), density

(see the full code on github)

The mapper loads a line and parses it into a track dictionary (more on this in a bit), and if we have a good track that has a tempo then we calculate the density by dividing the number of segments by the song’s duration.

Parsing the Million Song Dataset
We want to be able to process the MSD with code running on Amazon’s Elastic MapReduce.   Since the easiest way to get data to Elastic MapReduce is via Amazon’s Simple Storage Service (S3), we’ve loaded the entire MSD into a single S3 bucket at http://tbmmsd.s3.amazonaws.com/.  (The ‘tbm’ stands for Thierry Bertin-Mahieux, the man behind the MSD).  This bucket contains around 300 files each with data on about 3,000 tracks.  Each file is formatted with one track per line following the format described in the MSD field list.   You can see a small subset of this data for  just 20 tracks in this file on github: tiny.dat.   I’ve written track.py  that will parse this track data and return a dictionary containing all the data.

You are welcome to use this S3 version of the MSD for your Elastic MapReduce experiments.  But note that we are making the S3 bucket containing the MSD available as an experiment.  If you run your MapReduce jobs in the “US Standard Region” of Amazon, it should cost us little or no money to make this S3 data available.  If you want to download the MSD, please don’t download it from the S3 bucket, instead go to one of the other sources of MSD data such as Infochimps.  We’ll keep the S3 MSD data live as long as people don’t abuse it.

Running the Density MapReduce job

You can run the density MapReduce job on a local file to make sure that it works:

  % python density.py tiny.dat

This creates output like this:

["Planet P Project", "Pink World", "SOIAZJW12AB01853F1"]	3.3800521773317689
["Gleave", "Come With Me", "SOKBZHG12A81C21426"]	7.0173630509232234
["Chokebore", "Popular Modern Themes", "SOGVJUR12A8C13485C"]	2.7012807851495166
["Casual", "I Didn't Mean To", "SOMZWCG12A8C13C480"]	4.4351713380683542
["Minni the Moocher", "Rosi_ das M\u00e4dchen aus dem Chat", "SODFMEL12AC4689D8C"]	3.7249476012698159
["Rated R", "Keepin It Real (Skit)", "SOMJBYD12A6D4F8557"]	4.1905674943168156
["F.L.Y. (Fast Life Yungstaz)", "Bands", "SOYKDDB12AB017EA7A"]	4.2953929132587785

Where each ‘yield’ from the mapper is represented by a single line in the output, showing the track ID info and the calculated density.

Running on Amazon’s Elastic MapReduce

When you are ready to run the job on a million songs, you can run it the on Elastic Map Reduce.  First you will need to  set up your AWS system. To get setup for Elastic MapReduce follow these steps:

Once you’ve set things up, you can run your job on Amazon using the entire MSD as input by adding a few command switches like so:

 % python density.py --num-ec2-instances 100 --python-archive t.tar.gz -r emr 's3://tbmmsd/*.tsv.*' > out.dat

The ‘-r emr’ says to run the job on Elastic Map Reduce, and the ‘–num-ec2-instances 100′ says to run the job on 100 small EC2 instances.  A small instance currently costs about  ten cents an hour billed in one hour increments, so this job will cost about $10 to run if it finishes in less than an hour, and in fact this job takes about 20 minutes to run.  If you run it on only 10 instances it will cost 1 or 2 dollars. Note that the t.tar.gz file simply contains any supporting python code needed to run the job. In this case it contains the file track.py.  See the mrjob docs for all the details on running your job on EC2.

The Results
The output of this job is a million calculated densities, one for each track in the MSD.  We can sort this data to find the most and least dense tracks in the dataset.  Here are some high density examples:

Ichigo Ichie by Ryuji Takeuchi has a density of  9.2 segments/second

Ichigo Ichie by Ryuji Takeuchi


129 by  Strojovna 07 has a density of  9.2 segments/second

129 by Strojovna 07


The Feeding Circle by Makaton with a density of 9.1 segments per segment

The Feeding Circle by Makaton


Indeed, these pass the audio test, they are indeed high density tracks.  Now lets look at some of the lowest density tracks.

Deviation by Biosphere with a density of  .014 segments per second

Deviation by Biosphere


The Wire IV by Alvin Lucier with a density of 0.014 segments per second

The Wire IV by Alvin Lucier


improvisiation_122904b by Richard Chartier with a density of .02 segments per second

improvisation by Richard Chartier


Wrapping up
The ‘density’ MapReduce task is about as simple a task for processing the MSD that you’ll find.  Consider this the ‘hello, world’ of the MSD.  Over the next few weeks, I’ll be creating some more complex and hopefully interesting tasks that show some of the really interesting knowledge about music that can be gleaned from the MSD.

(Thanks to Thierry Bertin-Mahieux for his work in creating the MSD and setting up the S3 buckets. Thanks to 7Digital for providing the audio samples)

, ,

7 Comments

How do you discover music?

I’m interested in learning more about how people are discovering new music.  I hope that you will spend 2 mins and take this 3 question poll.  I’ll publish the results in a few weeks.

 

 

2 Comments

How do you spell ‘Britney Spears’?

I’ve been under the weather for the last couple of weeks, which has prevented me from doing most things, including blogging. Luckily, I had a blog post sitting in my drafts folder almost ready to go.  I spent a bit of time today finishing it up, and so here it is. A look at the fascinating world of spelling correction for artist names.

 
In today’s digital music world, you will often look for music by typing an artist name into a search box of your favorite music app.   However this becomes a problem if you don’t  know how to spell the name of the artist you are looking for. This is probably not much of a problem if you are  looking for U2, but it most definitely is a problem if you are looking for Röyksopp, Jamiroquai or  Britney Spears. To help solve this problem, we can try to identify common misspellings for artists and use these misspellings to help steer you to the artists that you are looking for.

A spelling corrector in 21 lines of code
A good place for us to start  is a post by  Peter Norvig (Director of Research at Google) called  ’How to write a spelling corrector‘ which presents a fully operational spelling corrector in 21 lines of Python.  (It is a phenomenal bit of code, worth the time studying it).  At the core of Peter’s  algorithm is the concept of the edit distance  which is a way to represent the similarity of two strings by calculating the number of operations (inserts, deletes, replacements and transpositions) needed to transform one string into the other.  Peter cites literature that suggests that 80 to 95% of spelling errors are within an edit distance of 1 (meaning that  most misspellings are just one insert, delete, replacement or transposition away from the correct word).     Not being satisfied with that accuracy, Peter’s algorithm considers all words that are within an edit distance of 2 as candidates for his spelling corrector.  For Peter’s small test case (he wrote his system on a plane so he didn’t have lots of data nearby), his corrector covered 98.9% of his test cases.

Spell checking Britney
A few years ago, the smart folks at Google posted a list of Britney Spears spelling corrections that shows nearly 600 variants on Ms. Spears name collected in three months of Google searches.   Perusing the list, you’ll find all sorts of interesting variations such as ‘birtheny spears’ , ‘brinsley spears’ and ‘britain spears’.  I suspect that some these queries (like ‘Brandi Spears’) may actually not be for  the pop artist. One curiosity in the list is that although there are 600 variations on the spelling of ‘Britney’ there is exactly one way that ‘spears’ is spelled.  There’s no ‘speers’ or ‘spheres’, or ‘britany’s beers’ on this list.

One thing I did notice about Google’s list of Britneys is that there are many variations that seem to be further away from the correct spelling than an edit distance of two at the core of Peter’s algorithm.  This means that if you give these variants to Peter’s spelling corrector, it won’t find the proper spelling. Being an empiricist I tried it and found that of the 593  variants of ‘Britney Spears’,  200 were not within an edit distance of two of the proper spelling and would not be correctable.  This is not too surprising.  Names are traditionally hard to spell, there are many alternative spellings for the name ‘Britney’ that are real names, and many people searching for music artists for the first time may have only heard the name pronounced and have never seen it in its written form.

Making it better with an artist-oriented spell checker
A 33% miss rate for a popular artist’s name seems a bit high, so  I thought I’d see if I could improve on  this.  I have one big advantage that Peter didn’t. I work for a music data company so I can be pretty confident that all the search queries that I see are going to be related to music. Restricting the possible vocabulary to just artist names makes things a whole lot easier. The algorithm couldn’t be simpler. Collect the names of the top 100K most popular artists. For each artist name query,  find the artist name with the smallest edit distance to the query and return that name as the best candidate match.  This algorithm will let us find the closest matching artist even if it is has an edit distance of more than 2 as we see in Peter’s algorithm.  When I run this against the 593 Britney Spears misspellings, I only get one mismatch – ‘brandi spears’ is closer to the artist ‘burning spear’ than it is to ‘Britney Spears’.  Considering the naive implementation, the algorithm is fairly fast (40 ms per query on my 2.5 year old laptop, in python).

Looking at spelling variations
With this artist-oriented spelling checker in hand,  I decided to take a look at some real artist queries to see what interesting things I could find buried within.   I gathered some artist name search queries from the Echo Nest API logs and looked for some interesting patterns (since I’m doing this at home over the weekend, I only looked at the most recent logs which consists of only about 2 million artist name queries).

Artists with most spelling variations
Not surprisingly, very popular artists are the most frequently misspelled.  It seems that just about every permutation has been made in an attempt to spell these artists.

  • Michael Jackson - Variations: michael jackson,  micheal jackson,  michel jackson,  mickael jackson,  mickal jackson,  michael jacson,  mihceal jackson,  mickeljackson,  michel jakson,  micheal jaskcon,  michal jackson,  michael jackson by pbtone,  mical jachson,  micahle jackson,  machael jackson,  muickael jackson,  mikael jackson,  miechle jackson,  mickel jackson,  mickeal jackson,  michkeal jackson,  michele jakson,  micheal jaskson,  micheal jasckson,  micheal jakson,  micheal jackston,  micheal jackson just beat,  micheal jackson,  michal jakson,  michaeljackson,  michael joseph jackson,  michael jayston,  michael jakson,  michael jackson mania!,  michael jackson and friends,  michael jackaon,  micael jackson,  machel jackson,  jichael mackson
  • Justin BieberVariations: justin bieber,  justin beiber,  i just got bieber’ed by,  justin biber,  justin bieber baby,  justin beber,  justin bebbier,  justin beaber,  justien beiber,  sjustin beiber,  justinbieber,  justin_bieber,  justin. bieber,  justin bierber,  justin bieber<3 4 ever<3,  justin bieber x mstrkrft,  justin bieber x,  justin bieber and selens gomaz,  justin bieber and rascal flats,  justin bibar,  justin bever,  justin beiber baby,  justin beeber,  justin bebber,  justin bebar,  justien berbier,  justen bever,  justebibar,  jsustin bieber,  jastin bieber,  jastin beiber,  jasten biber,  jasten beber songs,  gestin bieber,  eiine mainie justin bieber,  baby justin bieber,
  • Red Hot Chili PeppersVariations: red hot chilli peppers,  the red hot chili peppers,  red hot chilli pipers,  red hot chilli pepers,  red hot chili,  red hot chilly peppers,  red hot chili pepers,  hot red chili pepers,  red hot chilli peppears,  redhotchillipeppers,  redhotchilipeppers,  redhotchilipepers,  redhot chili peppers,  redhot chili pepers,  red not chili peppers,  red hot chily papers,  red hot chilli peppers greatest hits,  red hot chilli pepper,  red hot chilli peepers,  red hot chilli pappers,  red hot chili pepper,  red hot chile peppers
  • Mumford and SonsVariations: mumford and sons,  mumford and sons cave,  mumford and son,  munford and sons,  mummford and sons,  mumford son,  momford and sons,  modfod and sons,  munfordandsons,  munford and son,  mumfrund and sons,  mumfors and sons,  mumford sons,  mumford ans sons,  mumford and sonns,  mumford and songs,  mumford and sona,  mumford and,  mumford &sons,  mumfird and sons,  mumfadeleord and sons
  • Katy Perry - Even an artist with a seemingly very simple name like Katy Perry has numerous variations:  katy perry,  katie perry,  kate perry,    kathy perry,  katy perry ft.kanye west,  katty perry,  katy perry i kissed a girl,  peacock katy perry,  katyperry,  katey parey,   kety perry,  kety peliy,  katy pwrry,  katy perry-firework,  katy perry x,  katy perry,  katy perris,  katy parry,  kati perry,  kathy pery,  katey perry,  katey perey,  katey peliy,  kata perry,  kaity perry

Some other most frequently misspelled artists:

  • Britney Spears
  • Linkin Park
  • Arctic Monkeys
  • Katy Perry
  • Guns N’ Roses
  • Nicki Minaj
Which artists are the easiest to spell?
Using the same techniques we can look through our search logs and find the popular artists that have the fewest misspelled queries. These are the easiest to spell artists. They include:
  • Muse
  • Weezer
  • U2
  • Oasis
  • Moby
  • Flyleaf
  • Seether
Most confused artists:
Artists are most easily confused with another include:
  • byran adams - ryan adams
  • Underworld – Uverworld
Wrapping up
Spelling correction for artist names is perhaps the least sexiest job in the music industry, nevertheless it is an important part of helping people connect with the music they are looking for.   There is a large body of research around context-sensitive spelling correction that can be used to help solve this problem, but even very simple techniques like those described here can go along way to helping you figure out what someone really wants when they search for ‘Jastan Beebar’.

,

1 Comment

Reidentification of artists and genres in the KDD cup data

Back in February I wrote a post about the KDD Cup ( an annual Data Mining and Knowledge Discovery competition), asking whether this year’s cup  was really music recommendation since all the data identifying the music had been anonymized.  The post received a number of really interesting comments about the nature of recommendation and whether or not context and content was really necessary for music recommendation, or was user behavior all you really needed.   A few commenters suggested that it might be possible  de-anonymize the data using a constraint propagation technique.

Many voiced an opinion that such de-anonymizing of the data to expose user listening habits would indeed be unethical. Malcolm Slaney, the researcher at Yahoo! who prepared the dataset offered the plea:

If you do de-anonymize the data please don’t tell anybody. We’ll NEVER be able to release data again.

As far as I know, no one has de-anonymized the KDD Cup dataset, however, researcher Matthew J. H. Rattigan of The University of Massachusetts at Amherst has done the next best thing.  He has published a paper called Reidentification of artists and genres the KDD cup that shows that by analyzing at the relational structures within the dataset it is possible to identify the artists, albums, tracks and genres that are used in the anonymized dataset.   Here’s an excerpt from the paper that gives an intuitive description of the approach:

For example, consider Artist 197656 from the Track 1 data. This artist has eight albums described by different combinations of ten genres. Each album is associated with several tracks, with track counts ranging from 1 to 69. We make the assumption that these albums and tracks were sampled without replacement from the discography of some real artist on the Yahoo! Music website. Furthermore, we assume that the connections between genres and albums are not sampled; that is, if an album in the KDD Cup dataset is attached to three genres, its real-world counterpart has exactly three genres (or “Categories”, as they are known on the Yahoo! Music site).

Under the above assumptions, we can compare the unlabeled KDD Cup artist with real-world Yahoo! Music artists in order to find a suitable match. The band Fischer Z, for example, is an unsuitable match, as their online discography only contains seven albums. An artist such as Meatloaf certainly has enough albums (56) to be a match, but none of those albums contain more than 31 tracks. The entry for Elvis Presley contains 109 albums, 17 of which boast 69 or more tracks; however, there is no consistent assignment of genres that satisfies our assumptions. The band Tool, however, is compatible with Artist 197656. The Tool discography contains 19 albums containing between 0 and 69 tracks. These albums are described by exactly 10 genres, which can be assigned to the unlabeled KDD Cup genres in a consistent manner. Furthermore, the match is unique: of the 134k artists in our labeled dataset, Tool is the only suitable match for Artist 197656.

Of course it is impossible for Matthew to evaluate his results directly, but he did create a number of synthetic, anonymized datasets draw from Yahoo and was able to demonstrate very high accuracy for the top artists and a 62% overall accuracy.

The motivation for this type of work is not to turn the KDD cup dataset into something that music recommendation researchers could use, but instead is to get a better understanding of data privacy issues.  By understanding how large datasets can be de-anonymized, it will be easier for researchers in the future to create datasets that won’t be easily yield their hidden secrets.   The paper is an interesting read – so since you are done doing all of your reviews for RecSys and ISMIR, go ahead and give it a read:  https://www.cs.umass.edu/publication/docs/2011/UM-CS-2011-021.pdf.  Thanks to @ocelma for the tip.

, ,

1 Comment

catfish smooth

Kurt Jacobson is a recent additions to the staff here at The Echo Nest. Kurt has built a music exploration site  called  catfish smooth that allows you to explore the connections between artists.   Kurt describes it as:  all about connections between music artists. In a sense, it is a music artist recommendation system but more. For each artist, you will see the type of “similar artist” recommendations to which you are accustomed – we use last.fm and The Echo Nest to get these. But you will also see some other inter-artist connections catfish has discovered from the web of linked data. These include things like “artists that are also English Male Singers” or “artists that are also Converts To Islam” or “artists that are also People From St.Louis, Missouri”. And, hopefully, you’ll get some media for each artist so you can have a listen.

It’s a really interesting way to explore the music space, allowing you to stumble upon new artists based on a wide range of parameters.

For example take a look at the many categories and connections catfish smooth exposes for James Brown.

Kurt is currently conducting a usability survey for catfish smooth, so take a minute to kick the tires and then help Kurt finish his PhD and take the survey.

Leave a Comment

SongCards – an untapped app

I saw this interesting video from IDEO for c60 – an RFID-based interface that ‘reintroduces physicality to music, something lost with digitization and the move to the cloud.’

This video got me excited, because it is the hardware piece of an idea that my friend Steve Green and I  had called ‘SongCards’ while working at Sun Labs a few years ago.  We pitched SongCards to Sun’s management (Sun was big into RFID at the time so it seemed like a good fit), but Sun didn’t bite – they decided to go buy MySql instead.  And so this concept has been gathering digital dust in a text file on my laptop.  The c60 video has inspired me to dust it off and post i here.   I think there are some good ideas embedded in the concept. Perhaps the folks at IDEO will incorporate some into the c60, or maybe Eliot will add this idea to his Untapped Apps portfolio on Evolver.fm.

Here’s the concept in all its glory:

Observations

  • Many people have a physical connection with their music.  These people like to organize, display and interact with their music via the containers (album covers, cd cases).
  • Music is a highly social medium.  People enjoy sharing music with others.  People learn about new music from others in their social circle.
  • The location where music is stored will likely switch from devices managed by the listener to devices managed by a music service.  In the future, a music purchaser will purchase the right to listen to a particular song, while the actual music data will remain managed by the music service.
  • Digital music lacks much of the interesting metadata that previous generations of music listeners enjoyed – lyrics, photos of the performers, song credits.  The experience of reading the liner notes while listening to a new album has been lost in this new generation of digital music.Music is collectable. People take pride in amassing large collections of music and like to be able to exhibit their collection to others.

The Problem

The digital music revolution and the inevitable move of our music from our CD racks, iPods and computers, to the back room at Yahoo, Apple, or Google will make it convenient for people to listen to music in all sorts of new ways, however at the same time it will eliminate many of the interactions people have had with the music.  People can’t interact with the albums, read the liner notes, display their collection. They can’t trade songs with their friends. There is no way to show off a music collection beyond saying “I have 2,025 songs on my iPod”. Album art is a dying art.

weMusic collecting is not just about the music, it also is about the things that surround the music. Digital music has stripped away all of the packaging, and at the same time has stripped away a big part of the music collecting experience.  We want to change that.

The Idea

Imagine if you could buy music like you buy collectable trading cards such as Magic the Gathering, or Pokemon cards.  One could buy a pack of cards at the local 7-11 for a few dollars.  The cards could be organized by genre.  You could buy a pack of ‘boy-band’, ‘alternative-grunge’, ‘brit-pop’, ‘British invasion ‘, ‘drum and bass’ etc.  Each pack would contain 5 or 10 cards draw from the genre of the pack. Each card would have all of the traditional liner note metadata: lyrics, album art, artist bios.  Also associated with each card would be a unique ID that can be read by an electronic reader that would identify the unique instance of the card and the song/performance that the card represents.  The new owner of the card would add the song to their personal collection by just presenting it to any of the owner’s music players (presumably they are connected to the music server run by the music service provider). Once this is done, the user can access and play the song at any time on any of their music devices.

A package of music cards can be packaged in the same way as other trading cards are packaged. Typically in each pack there are one or two ‘special’ cards that are highly desirable. For music cards these would be the highly desirable ‘hit cards’.  The bulk of the cards in a deck could be made up of lesser known, or less popular bands.   For instance a ‘British invasion’ card set may contain ‘hey jude’ as the special card, and a few lesser known Beatles songs, and few songs by “the who” and perhaps some by “the monkees” and other songs by bands of that era.  This method of packing music would allow for serendipitous discovery of music since you would never know what songs you would get in the pack.  It would also encourage music trading, since you could trade your duplicate songs with other music collectors.

Trading – since the cards represent a song by a digital id, trading a song is as simple as giving the card to someone. As soon as the new owner of the card puts the card into one of their music players the transfer of ownership would occur, the song would be added to the collection of the new owner and removed from the collection of the old owner.  There would be no limit to how often a song could be traded.

Some interesting properties of music cards:

  • Your music collection once again has a physical presence. You can touch it, you can browse through it, you can stack it, you can show it off.
  • You can easily and legally trade or sell music with your friends (or on eBay for that matter).  Supply and demand economics can take hold in the music card after market (just as we’ve seen with Beanie Babies and Magic cards).Cards can be grouped in packages for sale using a number of criteria such as genre, popularity, geography, appeal to a certain demographic.
  • You can make playlists by ordering your cards.
  • You can make a random playlist by shuffling your cards.
  • At a social gathering, cards from many people can be combined into a single uber-playlist.
  • You will be potentially exposed to new music every time you buy a new pack of cards.
  • You will not need to carry your cards with you when you want to listen to music (the music service knows what music you own).
  • Since the music service ‘knows’ what music you own it can monitor trades and music popularity to track trend setters within a social group and target appropriate marketing at the trend setters.
  • Song cards can’t be ‘ripped’ in the traditional sense, giving music companies much more control over their intellectual property.

Some interesting variations:

  • The artwork on the back of a card could be one section of the album art for a whole album.  You could tack the cards up on the wall to form the album art when you have the whole album.
  • Some of the cards could be power cards that act as modifiers:
    • ‘More Like This‘ when inserted into a playlist, plays a similar song to the previously played card. The similar song is drawn from the entire music service collection not just the songs owned by the collector.
    • Genre Wild card’ – plays a random song from the genre.  The similar song is drawn from the entire music service collection not just the songs owned by the collector.
    • Musical Journey‘ – make a musical journey between the surrounding cards.  The songs on the journey are drawn from the entire music service collection not just the songs owned by the collector.
    • ‘Album Card’ – it’s not just a song, it’s a whole album.

Note that I don’t think that SongCards would replace all digital music sales. It would still be possible to purchase and download a song from iTunes as one can do today.  I think that SongCards would appeal to the ‘Music Collector’, while the traditional download would appeal to the ‘Music Listener’.

That’s it – SongCards - Just imagine what the world would be like if Sun had invested $800 million on SongCards instead of that open source database.

, ,

3 Comments

LastFM-ArtistTags2007

A few years back I created a data set of social tags from Last.fm. RJ at Last.fm graciously gave permission for me to distribute the dataset for research use.  I hosted the dataset on the media server at Sun Labs. However, with the Oracle acquisition, the media server is no longer serving up the data, so I thought I would post the data elsewhere.

The dataset is now available for download here: Lastfm-ArtistTags2007

Here are the details as told in the README file:

The LastFM-ArtistTags2007 Data set
Version 1.0
June 2008

What is this?

    This is a set of artist tag data collected from Last.fm using
    the Audioscrobbler webservice during the spring of 2007.

    The data consists of the raw tag counts for the 100 most
    frequently occuring tags that Last.fm listeners have applied
    to over 20,000 artists.

    An undocumented (and deprecated) option of the audioscrobbler
    web service was used to bypass the Last.fm normalization of tag
    counts.  This data set provides raw tag counts.

Data Format:

  The data is formatted one entry per line as follows:

  musicbrainz-artist-id<sep>artist-name<sep>tag-name<sep>raw-tag-count

Example:

    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>american<sep>14
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>animals<sep>5
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art punk<sep>21
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art rock<sep>18
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>atmospheric<sep>4
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>avantgarde<sep>3

Data Statistics:

    Total Lines:      952810
    Unique Artists:    20907
    Unique Tags:      100784
    Total Tags:      7178442

Filtering:

    Some minor filtering has been applied to the tag data.  Last.fm will
    report tag with counts of zero or less on occasion. These tags have
    been removed.

    Artists with no tags have not been included in this data set.
    Of the nearly quarter million artists that were inspected, 20,907
    artists had 1 or more tags.

Files:

    ArtistTags.dat  - the tag data
    README.txt      - this file
    artists.txt     - artists ordered by tag count
    tags.txt        - tags ordered by tag count

License:

    The data in LastFM-ArtistTags2007 is distributed with permission of
    Last.fm.  The data is made available for non-commercial use only under
    the Creative Commons Attribution-NonCommercial-ShareAlike UK License.
    Those interested in using the data or web services in a commercial
    context should contact partners at last dot fm. For more information
    see http://www.audioscrobbler.net/data/

Acknowledgements:

    Thanks to Last.fm for providing the access to this tag data via their
    web services

Contact:

    This data was collected, filtered and by Paul Lamere of The Echo Nest. Send
    questions or comments to Paul.Lamere@gmail.com

 

,

1 Comment

Is that a million songs in your pocket, or are you just glad to see me?

Yesterday, Steve Jobs reminded us that it was less than 10 years ago when Apple announced the first iPod which could put a thousand songs in your pocket.  With the emergence of cloud-based music services like Spotify and Rhapsody, we can now have a virtually endless supply of music in our pocket.  The  ’bottomless iPod’ will have as big an effect on how we listen to music as the original iPod had back in 2001.  But with millions of songs to chose from, we will need help finding music that we want to hear.  Shuffle play won’t work when we have a million songs to chose from.  We will need new tools that help us manage our listening experience.  I’m convinced that one of these tools will be intelligent automatic playlisting.

This weekend at the Music Hack Day London, The Echo Nest is releasing the first version of our new Playlisting API.  The Playlisting API  lets developers construct playlists based on a flexible set of artist/song selection and sorting rules.  The Echo Nest has deep data about millions of artists and songs.  We know how popular Lady Gaga is, we know the tempo of every one of her songs,  we know other artists that sound similar to her, we know where she’s from, we know what words people use to describe her music (‘dance pop’, ‘club’, ‘party music’, ‘female’, ‘diva’ ).  With the Playlisting API we can use this data to select music and arrange it in all sorts of flexible ways – from very simple Pandora radio style playlists of similar sounding songs to elaborate playlists drawing on a wide range of parameters.  Here are some examples of the types of playlists you can construct with the API:

  • Similar artist radio – generate a playlist of songs by similar artists
  • Jogging playlist – generate a playlist of 80s power pop with a tempo between 120 and 130 BPM, but never ever play Bon Jovi
  • London Music Hack Day Playlist -generate a playlist of electronic and techno music by unknown artists near London, order the tracks by tempo from slow to fast
  • Tomorrow’s top 40 – play  the hottest songs by  pop artists with low familiarity that are starting to get hottt
  • Heavy Metal Radio – A DMCA-Compliant radio stream of nothing but heavy metal

We have also provide a dynamic playlisting API that will allow for the creation of playlists that adapt based upon skipping and rating behavior of the listener.

I’m about to jump on a plane for the Music Hackday London where we will be demonstrating this new API and some cool apps that have already been built upon it.    I’m  hoping to see a few apps emerge from this Music Hack Day that use  the new API.  More info about the APIs and how you can use it to do all sorts of fun things will be forthcoming.  For the motivated dive into the APIs right now.

3 Comments

Upbeat and Quirky, With a Bit of a Build: Interpretive Repertoires in Creative Music Search

Upbeat and Quirky, With a Bit of a Build: Interpretive Repertoires in Creative Music Search
Charlie Inskip, Andy MacFarlane and Pauline Rafferty

ABSTRACT Pre-existing commercial music is widely used to accompany moving images in films, TV commercials and computer games. This process is known as music synchronisation. Professionals are employed by rights holders and film makers to perform creative music searches on large catalogues to find appropriate pieces of music for syn- chronisation. This paper discusses a Discourse Analysis of thirty interview texts related to the process. Coded examples are presented and discussed. Four interpretive re- pertoires are identified: the Musical Repertoire, the Soundtrack Repertoire, the Business Repertoire and the Cultural Repertoire. These ways of talking about music are adopted by all of the community regardless of their interest as Music Owner or Music User.

Music is shown to have multi-variate and sometimes conflicting meanings within this community which are dynamic and negotiated. This is related to a theoretical feedback model of communication and meaning making which proposes that Owners and Users employ their own and shared ways of talking and thinking about music and its context to determine musical meaning. The value to the music information retrieval community is to inform system design from a user information needs perspective.

Leave a Comment

What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas

What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas
Peter Grosche, Meinard Müller and Craig Stuart Sapp

ABSTRACT – The automated extraction of tempo and beat information from music recordings is a challenging task. Especially in the case of expressive performances, current beat tracking approaches still have significant problems to accurately capture local tempo deviations and beat positions. In this paper, we introduce a novel evaluation framework for detecting critical passages in a piece of music that are prone to tracking errors. Our idea is to look for consistencies in the beat tracking results over multiple performances of the same underlying piece. As another contribution, we further classify the critical passages by specifying musical properties of certain beats that frequently evoke trac ing errors. Finally, considering three conceptually different beat tracking procedures, we conduct a case study on the basis of a challenging test set that consists of a variety of piano performances of Chopin Mazurkas. Our experimental results not only make the limitations of state-of-the-art beat trackers explicit but also deepens the understanding of the underlying music material.

Leave a Comment

Follow

Get every new post delivered to your Inbox.

Join 91 other followers