Posts Tagged python
Finding duplicate songs in your music collection with Echoprint
Posted by Paul in code, Music, The Echo Nest on June 25, 2011
This week, The Echo Nest released Echoprint – an open source music fingerprinting and identification system. A fingerprinting system like Echoprint recognizes music based only upon what the music sounds like. It doesn’t matter what bit rate, codec or compression rate was used (up to a point) to create a music file, nor does it matter what sloppy metadata has been attached to a music file, if the music sounds the same, the music fingerprinter will recognize that. There are a whole bunch of really interesting apps that can be created using a music fingerprinter. Among my favorite iPhone apps are Shazam and Soundhound – two fantastic over-the-air music recognition apps that let you hold your phone up to the radio and will tell you in just a few seconds what song was playing. It is no surprise that these apps are top sellers in the iTunes app store. They are the closest thing to magic I’ve seen on my iPhone.
In addition to the super sexy applications like Shazam, music identification systems are also used for more mundane things like copyright enforcement (helping sites like Youtube keep copyright violations out of the intertubes), metadata cleanup (attaching the proper artist, album and track name to every track in a music collection), and scan & match like Apple’s soon to be released iCloud music service that uses music identification to avoid lengthy and unnecessary music uploads. One popular use of music identification systems is to de-duplicate a music collection. Programs like tuneup will help you find and eliminate duplicate tracks in your music collection.
This week I wanted to play around with the new Echoprint system, so I decided I’d write a program that finds and reports duplicate tracks in my music collection. Note: if you are looking to de-duplicate your music collection, but you are not a programmer, this post is *not* for you, go and get tuneup or some other de-duplicator. The primary purpose of this post is to show how Echoprint works, not to replace a commercial system.
How Echoprint works
Echoprint, like many music identification services is a multi-step process: code generation, ingestion and lookup. In the code generation step, musical features are extracted from audio and encoded into a string of text. In the ingestion step, codes for all songs in a collection are generated and added to a searchable database. In the lookup step, the codegen string is generated for an unknown bit of audio and is used as a fuzzy query to the database of previously ingested codes. If a suitably high-scoring match is found, the info on the matching track is returned. The devil is in the details. Generating a short high level representation of audio that is suitable for searching that is insensitive to encodings, bit rate, noise and other transformations is a challenge. Similarly challenging is representing a code in a way that allows for high speed querying and allows for imperfect matching of noisy codes.
Echoprint consists of two main components: echoprint-codegen and echoprint-server.
Code Generation
echoprint-codegen is responsible for taking a bit of audio and turning it into an echoprint code. You can grab the source from github and build the binary for your local platform. The binary will take an audio file as input and give output a block of JSON that contains song metadata (that was found in the ID3 tags in the audio) along with a code string. Here’s an example:
plamere$ echoprint-codegen test/unison.mp3 0 10
[
{"metadata":{"artist":"Bjork",
"release":"Vespertine",
"title":"Unison",
"genre":"",
"bitrate":128,"sample_rate":44100, "duration":405,
"filename":"test/unison.mp3",
"samples_decoded":110296,
"given_duration":10, "start_offset":1,
"version":4.11,
"codegen_time":0.024046,
"decode_time":0.641916},
"code_count":174,
"code":"eJyFk0uyJSEIBbcEyEeWAwj7X8JzfDvKnuTAJIojWACwGB4QeM\
HWCw0vLHlB8IWeF6hf4PNC2QunX3inWvDCO9WsF7heGHrhvYV3qvPEu-\
87s9ELLi_8J9VzknReEH1h-BOKRULBwyZiEulgQZZr5a6OS8tqCo00cd\
p86ymhoxZrbtQdgUxQvX5sIlF_2gUGQUDbM_ZoC28DDkpKNCHVkKCgpd\
OHf-wweX9adQycnWtUoDjABumQwbJOXSZNur08Ew4ra8lxnMNuveIem6\
LVLQKsIRLAe4gbj5Uxl96RpdOQ_Noz7f5pObz3_WqvEytYVsa6P707Jz\
j4Oa7BVgpbKX5tS_qntcB9G--1tc7ZDU1HamuDI6q07vNpQTFx22avyR",
"tag":0}
]
In this example, I’m only fingerprinting the first 10 second of the song to conserve space. The code string is just a base64 encoding of a zlib compression of the original code string, which is a hex encoded series of ASCII numbers. A full version of this code is what is indexed by the lookup server for fingerprint queries. Codegen is quite fast. It scans audio at roughly 250x real time per processor after decoding and resampling to 11025 Hz. This means a full song can be scanned in less than 0.5s on an average computer, and an amount of audio suitable for querying (30s) can be scanned in less than 0.04s. Decoding from MP3 will be the bottleneck for most implementations. Decoders like mpg123 or ffmpeg can decode 30s mp3 audio to 11025 PCM in under 0.10s.
The Echoprint Server
The Echoprint server is responsible for maintaining an index of fingerprints of (potentially) millions of tracks and serving up queries. The lookup server uses the popular Apache Solr as the search engine. When a query arrives, the codes that have high overlap with the query code are retrieved using Solr. The lookup server then filters through these candidates and scores them based on a number of factors such as the number of codeword matches, the order and timing of codes and so on. If the best matching code has a high enough score, it is considered a hit and the ID and any associated metadata is returned.
To run a server, first you ingest and index full length codes for each audio track of interest into the server index. To perform a lookup, you use echoprint-codegen to generate a code for a subset of the file (typically 30 seconds will do) and issue that as a query to the server.
The Echo Nest hosts a lookup server, so for many use cases you won’t need to run your own lookup server. Instead , you can make queries to the Echo Nest via the song/identify call. (We also expect that many others may run public echoprint servers as well).
Creating a de-duplicator
With that quick introduction on how Echoprint works let’s look at how we could create a de-duplicator. The core logic is extremely simple:
create an empty echoprint-server
foreach mp3 in my-music-collection:
code = echoprint-codegen(mp3) // generate the code
result = echoprint-server.query(code) // look it up
if result: // did we find a match?
print 'duplicate for', mp3, 'is', result
else: // no, so ingest the code
echoprint-server.ingest(mp3, code)
We create an empty fingerprint database. For each song in the music collection we generate an Echoprint code and query the server for a match. If we find one, then the mp3 is a duplicate and we report it. Otherwise, it is a new track, so we ingest the code for the new track into the echoprint server. Rinse. Repeat.
I’ve written a python program dedup.py to do just this. Being a cautious sort, I don’t have it actually delete duplicates, but instead, I have it just generate a report of duplicates so I can decide which one I want to keep. The program also keeps track of its state so you can re-run it whenever you add new music to your collection.
Here’s an example of running the program:
% python dedup.py ~/Music/iTunes
1 1 /Users/plamere/Music/misc/ABBA/Dancing Queen.mp3
( lines omitted...)
173 41 /Users/plamere/Music/misc/Missy Higgins - Katie.mp3
174 42 /Users/plamere/Music/misc/Missy Higgins - Night Minds.mp3
175 43 /Users/plamere/Music/misc/Missy Higgins - Nightminds.mp3
duplicate /Users/plamere/Music/misc/Missy Higgins - Nightminds.mp3
/Users/plamere/Music/misc/Missy Higgins - Night Minds.mp3
176 44 /Users/plamere/Music/misc/Missy Higgins - This Is How It Goes.mp3
Dedup.py print out each mp3 as it processes it and as it finds a duplicate it reports it. It also collects a duplicate report in a file in pblml format like so:
duplicate <sep> iTunes Music/Bjork/Greatest Hits/Pagan Poetry.mp3 <sep> original <sep> misc/Bjork Radio/Bjork - Pagan Poetry.mp3 duplicate <sep> iTunes Music/Bjork/Medulla/Desired Constellation.mp3 <sep> original <sep> misc/Bjork Radio/Bjork - Desired Constellation.mp3 duplicate <sep> iTunes Music/Bjork/Selmasongs/I've Seen It All.mp3 <sep> original <sep> misc/Bjork Radio/Bjork - I've Seen It All.mp3
Again, dedup.py doesn’t actually delete any duplicates, it will just give you this nifty report of duplicates in your collection.
Trying it out
If you want to give dedup.py a try, follow these steps:
- Download, build and install echoprint-codegen
- Download, build, install and run the echoprint-server
- Get dedup.py.
- Edit line 10 in dedup.py to set the sys.path to point at the echoprint-server API directory
- Edit line 13 in dedup.py to set the _codegen_path to point at your echoprint-codegen executable
% python dedup.py ~/Music
This will find all of the dups and write them to the dedup.dat file. It takes about 1 second per song. To restart (this will delete your fingerprint database) run:
% python dedup.py --restart
Note that you can actually run the dedup process without running your own echoprint-server (saving you the trouble of installing Apache-Solr, Tokyo cabinet and Tokyo cabinet). The downside is that you won’t have any persistent server, which means that you’ll not be able to incrementally de-dup your collection – you’ll need to do it in all in one pass. To use the local mode, just add local-True to the fp.py calls. The index is then kept in memory, no solr or Tokyo tyrant is needed.
Wrapping up
dedup.py is just one little example of the kind of application that developers will be able to create using Echoprint. I expect to see a whole lot more in the next few months. Before Echoprint, song identification was out of the reach of the typical music application developer, it was just too expensive. Now with Echoprint, anyone can incorporate music identification technology into their apps. The result will be fewer headaches for developers and much better music applications for everyone.
Echo Nest Remix at the Boston Python Meetup Group
Posted by Paul in events, remix, The Echo Nest on July 15, 2010
Next week I’ll be giving a talk about remixing music with Echo Nest remix at the Boston Python Meetup Group. If you are in the Boston / Cambridge area next week, be sure to come on by and say ‘hi’. Info and RSVP for the talk are here: The Boston Python Meetup Group on Meetup.com
Here’s the abstract for the talk:
Paul Lamere will tell us about Echo Nest remix. Remix is an open source Python library for remixing music. With remix you can use Python to rearrange a track, combine it with others, beat/pitch shift it etc. – essentially it lets you treat a song like silly putty.
The Swinger is an interesting example of what it can do that made the rounds of the blogosphere: it morphs songs to give them a swing rhythm.
For more details about the type of music remixing you can do with remix, feel free to read: http://musicmachinery…
Python and Music at PyCon 2010
Posted by Paul in code, Music, The Echo Nest on February 15, 2010
If you are lucky enough to be heading to PyCon this week and are interested in hacking on music, there are two talks that you should check out:
DJing in Python: Audio processing fundamentals – In this talk Ed Abrams talks about how his experiences in building a real-time audio mixing application in Python. I caught a dry-run of this talk at the local Python SIG – lots of info packed into this 30 minute talk. One of the big takeaways from this talk is the results of Ed’s evaluation of a number of Pythonic audio processing libraries. Sunday 01:15pm, Centennial I
Remixing Music Pythonically – This is a talk by Echo Nest friend and über-developer Adam Lindsay. In this talk Adam talks about the Echo Nest remix library. Adam, a frequent contributor to remix, will offer details on the concise expressiveness offered when editing multimedia driven by content-based features, and some insights on what Pythonic magic did and didn’t work in the development of the modules. Audio and video examples of the fun-yet-odd outputs that are possible will be shown. Sunday 01:55pm, Centennial I
The schedulers at PyCon have done a really cool thing and have put the talks back to back in the same room. Also, keep your eye out for the Hacking on Music OpenSpace
The Echo Nest gets ready for Boston Music Hack Day
Posted by Paul in code, java, Music, The Echo Nest, web services on November 19, 2009
We’ve been extremely busy this week at the Echo Nest getting ready for the Boston Music Hack Day. Not only have we been figuring out menus, panel room assignments, and dealing with a waitlist, we’ve also been releasing a set of new API features. Here’s a quick rundown of what we’ve done:
- get_images – a frequent request from developers – we now have an API method that will let you get images for an artist. Note that we are releasing this method as a sneak preview for the hack day – we have images for over 60 thousand artists, but we will be aggressively adding more images over the next few weeks (60 thousand artists is a lot of artists, but we’d like to have lots more). We’ll also be expanding our sources of images to include many more sources. The results of the get_images are already good. 95% of the time you’ll get images. Over the next few weeks, the results will get even better.
- get_biographies – another frequent request from developers – we now have a get_biographies API method that will return a set of artist biographies for any artist. We currently have biographies for about a quarter million artists – and just as with get_images – we are working hard to expand the breadth and depth of this coverage. Nevertheless, with coverage for a quarter million artists, 99.99% of the time when you ask for a biography we’ll have it.
- get_similar – we’ve expanded the number of similar artists you can get back from get_similar from 15 to 100. This gives you lots more info for building playlisting and music discovery apps.
- buckets – one issue that our developers have had was that to fill out info on an artist often took a number of calls to the Echo Nest – one to get similars, one to get audio, one for video, familiarity, hotttnesss etc. To fill out an artist page it could take half a dozen calls. To reduce the number of calls needed to get artist information we’ve added a ‘bucket’ parameter to the search_artist, the get_similar and the get_profile calls. The bucket parameter allows you to specify which additional artist info should be returned in the call. You can specify ‘audio,’ ‘biographies,’ ‘blogs,’ ‘familiarity,’ ‘hotttnesss,’ ‘news,’ ‘reviews,’ ‘urls,’, ‘images’ or ‘video’ and whenever you get artist data back you’ll get the specified info included. For example with the call:
http://developer.echonest.com/api/get_profile ?api_key=EHY4JJEGIOFA1RCJP &id=music://id.echonest.com/~/AR/ARH6W4X1187B99274F &version=3 &bucket=familiarity &bucket=hotttnessswill return an artist block that looks like this:
<artist> <name>Radiohead</name> <id>music://id.echonest.com/~/AR/ARH6W4X1187B99274F</id> <familiarity>0.899230928024</familiarity> <hotttnesss>0.847409181874</hotttnesss> </artist>
There’s another new feature that we are starting to roll out. It’s called Echo Source – it allows the developer to get content (such as images, audio, video etc.) based upon license info. Echo Source is a big deal and deserves a whole post – but that’s going to have to wait until after Music Hack Day. Suffice it to say that with Echo Source you’ll have a new level of control over what content the Echo Nest API returns.
We’ve updated our Java and Python libraries to support the new calls. So grab yourself an API key and start writing some music apps.
Artist radio in 10 lines of code
Posted by Paul in code, fun, Music, playlist, The Echo Nest, web services on July 16, 2009
Last week we released Pyechonest, a Python library for the Echo Nest API. Pyechonest gives the Python programmer access to the entire Echo Nest API including artist and track level methods. Now after 9 years working at Sun Microsystems, I am a diehard Java programmer, but I must say that I really enjoy the nimbleness and expressiveness of Python. It’s fun to write little Python programs that do the exact same thing as big Java programs. For example, I wrote an artist radio program in Python that, given a seed artist, generates a playlist of tracks by wandering around the artists in the neighborhood of the seed artists and gathering audio tracks. With Pyechonest, the core logic is 10 lines of code:
def wander(band, max=10):
played = []
while max:
if band.audio():
audio = random.choice(band.audio())
if audio['url'] not in played:
play(audio)
played.append(audio['url'])
max -= 1
band = random.choice(band.similar())
(You can see/grab the full code with all the boiler plate in the SVN repository)
This method takes a seed artist (band) and selects a random track from set of audio that The Echo Nest has found on the web for that artist, and if we haven’t already played it, then do so. Then we select a near neighbor to the seed artist and do it all again until we’ve played the desired number of songs.
For such a simple bit of code, the playlists generated are surprisingly good..Here are a few examples:
Seed Artist: Led Zeppelin:
- You Shook Me by Led Zeppelin via licorice-pizza
- Suicide by Thin Lizzy via dmg541
- I Ain’t The One by Lynyrd Skynrd via artdecade
- Fortunate Son by Creedence Clearwater Revival via onesweetsong
- Susie-Q by Dale Hawkins via boogiewoogieflu
(I think the Dale Hawkins version of Susie-Q after CCR’s Fortunate Son is just brilliant)
Seed Artist: The Decemberists:
- The Wanting Comes In Waves/Repaid by The Decemberists via londononburgeoningmetropolis
- Amazing Grace by Sufjan Stevens via itallstarted
- Baby’s Romance by Chris Garneau via slowcoustic
- Saint Simon by The Shins via pastaprima
- Made Up Love Song #43 by Guillemots via merryswankster
(Note that audio for these examples is audio found on the web – and just like anything on the web the audio could go away at any time)
I think these artist-radio style playlists rival just about anything you can find on current Internet radio sites – which ain’t to0 bad for 10 lines of code.
Where’s the Pow?
This morning, while eating my Father’s day bagel, I got to play some more with the video aspects of the Echo Nest remix API. The video remix is pretty slick. You use all of the tools that you use in the audio remix, except that the object you are manipulating has a video component as well. This makes it easy to take an audio remix and turn it into a video remix. For instance, here’s the remix code to create a remix that includes the first beat of every bar:
audiofile = audio.LocalAudioFile(input_filename)
collect = audio.AudioQuantumList()
for bar in audiofile.analysis.bars:
collect.append(bar.children()[0])
out = audio.getpieces(audiofile, collect)
out.encode(output_filename)
To turn this into a video remix, just change the code to:
av = video.loadav(input_filename)
collect = audio.AudioQuantumList()
for bar in av.audio.analysis.bars:
collect.append(bar.children()[0])
out = video.getpieces(av, collect)
out.save(output_filename)
The code is nearly identical, differing in loading and saving, while the core remix logic stays the same.
To make a remix of a YouTube video, you need to save a local copy of the video. I’ve been using KeepVid to save local flv (flash video format) of any Youtube video.
Today I played with the track ‘Boom Boom Pow’ by the Black Eyed Peas. It’s a fun song for remix because it has a very strong beat, and already has a remix feel to it. And since the song is about digital transformation, it seems to be a good target for remix experiments. (and just maybe they won’t mind the liberties I’ve taken with their song).
Here’s the original (click through to YouTube to watch it since embedding is not allowed):
Just Boom
The first remix is to only include the first beat of every measure. The code is this:
for bar in av.audio.analysis.bars: collect.append(bar.children()[0])
Just Pow
Change the beat included from beat zero to beat three, and we get something that sounds very different:
Pow Boom Boom
Here’s a version with the beats reversed. The core logic for this transformation is one line of code:
av.audio.analysis.beats.reverse()
The 5/4 Version
Here’s a version that’s in 5/4 – to make this remix I duplicated the first beat and swapped beats 2 and 3. This is my favorite of the bunch.
These transformations are of the simplest variety, taking just a couple of minutes to code and try out. I’m sure some budding computational remixologist could do some really interesting things with this API.
Note that the latest video support is not in the main branch of remix. If you want to try some of this out you’ll need to check out the bl-video branch from the svn repository. But this is guaranteed to be rolled into the main branch before the upcoming Music Hackday. Update: the latest video support is now part of the main branch. If you want to try it out, check it out from the trunk of the SVN repository. So download the code, grab your API key and start remixing.
Update: As Brian pointed out in the comments there was some blocking on the remix renders. This has been fixed, so if you grab the latest code, the video output quality is as good as the input.
The Echo Nest remix 1.0 is released!
Posted by Paul in code, fun, Music, remix, The Echo Nest, web services on May 12, 2009
Version 1.0 of the Echo Nest remix has been released. Echo Nest Remix is an open source SDK for Python that lets you write programs that manipulate music. For example, here’s a python function that will take all the beats of a song, and reverse their order:
def reverse(inputFilename, outputFilename):
audioFile = audio.LocalAudioFile(inputFilename)
chunks = audioFile.analysis.beats
chunks.reverse()
reversedAudio = audio.getpieces(audioFile, chunks)
reversedAudio.encode(outputFilename)
When you apply this to a song by The Beatles you get something that sounds like this:
which is surprisingly recognizable, musical – and yet different from the original.
Quite a few web apps have been written that use remix. One of my favorites is DonkDJ, which will ‘put a donk‘ on any song. Here’s an example: Hung Up by Madonna (with a Donk on it):
This is my jam lets you create mini-mixes to share with people.
And where would the web be without the ability to add more cowbell to any song.
There’s lots of good documentation already for remix. Adam Lindsay has created a most excellent overview and tutorial for remix. There’s API documentation and there’s documentation for the underlying Echo Nest web services that perform the audio analysis. And of course, the source is available too.
So, if you are looking for that fun summer coding project, or if you need an excuse to learn Python, or perhaps you are a budding computational remixologist download remix, grab an API key from the Echo Nest and start writing some remix code.
Here’s one more example of the fun stuff you can do with remix. Guess the song, and guess the manipulation:
The Echo Nest Remix SDK
Posted by Paul in fun, Music, The Echo Nest on February 28, 2009
One of the joys of working at the Echo Nest is the communal music playlist. Anyone can add, rearrange or delete music from the queue. Of course, if you need to bail out (like when that Cindi Lauper track is sending you over the edge) you can always put on your headphones and tune out the mix. The other day, George Harrison’s “Here Comes the Sun” started playing, but this was a new version – with a funky drum beat, that I had never heard before – perhaps this was a lost track from the Beatle’s Love? Nope, turns out it was just Ben, one of the Echo Nest developers, playing around with The Echo Nest Remix SDK.
The Echo Nest Remix SDK is an open source Python library that lets you manipulate music and video. It sits on top of the Echo Nest Analyze API, hides all of the messy details of sending audio back to the Echo Nest, and parsing the XML response, while still giving you access to the full power of the API.
remix – is one of The Echo Nest’s secret weapons – it gives you the ability to analyze and manipulate music – and not just audio manipulations such as filtering or equalizing, but the ability to remix based on the hierarchical structure of a song. remix sits on top of a very deep analysis of the music that teases out all sorts of information about a track. There’s high level information such as the key, tempo time signature, mode (major or minor) and overall loudness. There’s also information about the song structure. A song is broken down into sections (think verse, chorus, bridge, solo), bars, beats, tatums (the smallest perceptual metrical unit of the song) and segments (short, uniform sound entities). remix gives you access to all of this information.
I must admit that I’ve been a bit reluctant to use remix – mainly because after 9 years at Sun Microsystems I’m a hard core Java programmer (the main reason I went to Sun in the first place was because I liked Java so much). Every time I start to use Python I get frustrated because it takes me 10 times longer than it would in Java. I have to look everything up. How do I concatenate strings? How do I find the length of a list? How do I walk a directory tree? I can code so much faster in Java. But … if there was ever a reason for me to learn Python it is this remix SDK. It is just so much fun – and it lets you do some of the most incredible things. For example, if you want to add a cowbell to every beat in a song, you can use remix to get the list of all of the beats (and associated confidences) in a song, and simply overlap a cowbell strike at each of the time offsets.
So here’s my first bit of Python code using remix. I grabbed one of the code samples that’s included in the distribution, had the aforementioned Ben spend two minutes walking me through the subtleties of Audio Quantum and I was good to go. My first bit of code just takes a song and swaps beat two and beat three of all measures that have at least 3 beats.
def swap_beat_2_and_3(inputFile, outputFile):
audiofile = audio.LocalAudioFile(inputFile)
bars = audiofile.analysis.bars
collect = audio.AudioQuantumList()
for bar in bars:
beats = bar.children()
if (len(beats) >= 3):
(beats[1], beats[2]) = (beats[2], beats[1])
for beat in beats:
collect.append(beat);
out = audio.getpieces(audiofile, collect)
out.encode(outputFile)
The code analyzes the input, iterates through the bars and if a bar has more than three beats, swaps them. (I must admit, even as a hard core Java programmer, the ability to swap things with (a,b) = (b,a) is pretty awesome) and then encodes and writes out a new audiofile. The resulting audio is surprisingly musical. Here’s the result as applied to Maynard Ferguson’s “Birdland”:
This is just great programming fun. I think I’ll be spending my spare coding time learning more Python so I can explore all of the things one can do with remix.

