Slides for my Data Mining Music talk
Posted by Paul in events, The Echo Nest on March 15, 2012
I recently gave a talk on Data Mining Music at SXSW. It was a standing room only session, with an enthusiastic audience that asked great questions. It was a really fun time for me. I’ve posted the slides to Slideshare, but be warned that there are no speaker notes so it may not always be clear what any particular slide is about. There was lots of music in the talk, but unfortunately, it is not in the Slideshare PDF. The links below should flesh out most of the details and have some audio examples.
Related Links:
- Have artist names been getting longer?
- The Passion Index – Find the bands that have the most passionate fans
- Six Degrees of Black Sabbath – Using artist relationship data to build a Six Degrees of Kevin Bacon for Music
- Frog-based playlisting – Building advanced playlists by finding paths through the artist space
- The Click Track Detector – Finding drummers that use a click track
- Looking for the Slow Build – Finding songs that have a gradual build
- Bohemian Rhapsichord – Turning a popular song into a musical instrument, with data.
- Midem Music Machine – Making a beautiful visualization of music
- The Swinger – Making any song swing
Thanks to everyone who attended.
Data Mining Music at SXSW
If you happen to be in Austin this week for SXSW consider attending my talk called Data Mining Music. It is all about the fun things you can discover about music when you have data about millions of songs and artists.
The talk is on Sunday, Marcy 11 at 5:00PM in the Rio Grande room of the Hilton Garden Inn. All the details are here: Data Mining Music
Boil the Frog – the unreleased Spotify Version
Update – You are probably looking for this web-based version of Boil The Frog and the blog post about it.
The rest of this article is about the unreleased Spotify Version of Boil the Frog.

I’m at Music Apps Hack Weekend doing my favorite thing: hacking on music. I’ve just finished my hack called Boil the Frog. Boil the Frog is a Spotify App that will create playlists that gradually take you from one music style to another. It is like the proverbial story of the frog in the pot of water. If you heat the water gradually, the frog won’t notice and will happily sit in the pot until it becomes frog stew. With Boil the Frog you can do the same thing musically. Create a playlist that gradually takes your pre-teen from Miley Cyrus to Miles Davis, or perhaps more perversely the Kenny G fan to Cannibal Corpse.
To build the app I built an artist similarity graph of 100,000 of the most popular artists. I use The Echo Nest artist similarity to connect each artist to its four nearest neighbors. To find the path between any two artists I use a bidirectional Dijkstra shortest path algorithm. Most paths can be computed in less than 100ms.
The Spotify Apps API is the perfect hacking platform. You can build a Spotify app that has full access to the vast Spotify music catalog and artwork, along with access to the listener’s catalog. Since the Spotify Apps run in an embedded browser all of your web app programming skills apply. You can use jQuery, make calls to JSON APIs, use HTML 5 canvas. It is all there. Spotify has done a really good job putting together this platform. The only downside is that, unlike the web, it is hard to actually release Spotify apps, but the Spotify team is working to make this easier. I’d love to release Boil the Frog because it is really fun to make playlists that bring you from one music style to another. It is interesting to see what musical neighborhoods you wander through on your way. For instance, I made a Kenny G to Cannibal Corpse playlist. To get there, the playlist brought me from easy listening, to movie soundtracks and then through video game soundtracks to get to the heavy metal world. Cool stuff. If you want to see a playlist between two artists let me know in the comments and I’ll create and share the playlist with you.
I made a video of Boil the Frog in action. Check it out:
[youtube http://youtu.be/Nj6JAxm9aPE]
Update: I’ve just pushed the client code out to github: https://github.com/plamere/boilthefrog
Hackathons are not nonsense
Dave Winer says that Hackathons are nonsense. Specifically he says:
Hackathons are how marketing guys wish software were made.
However, to make good software, requires lots of thought, trial and error, evaluation, iteration, trying the ideas out on other users, learning, thinking, more trial and error, and on and on. At some point you say it ain’t perfect, but it’s useful, so let’s ship. That process, if the software is to be any good, doesn’t happen in 24 hours. Sometimes it takes years, if the idea is new enough.
Dave says that software is hard and you can’t you can’t expect to build shippable software in a day. That’s certainly true, and if the goal of a hackathon was to get a bunch of developers together to build and ship commercial software in a day, I’d agree with him. But that’s not the goal of any of the hackathons I’ve attended.
I’ve participated in and/or helped organize perhaps a dozen Music Hack Days. At a Music Hack Day, people who are interested in music and technology get together for a weekend to learn about music tech and to build something with it. The goal isn’t to ship a software product, it is to scratch that personal itch to do something cool with music. The people who come to a Music Hack Day are often not in the music tech space, but are interested in learning about all the music APIs and tech available. They come to learn and then use what they’ve learned to build something. At the most recent Music Hack Day in San Francisco, 200 hackers built 60 hacks including new musical instruments, new music discovery tools, social music apps and music games.
Music Hack Days are not nonsense. They are incredibly creative weekends that have resulted in a 1,000 or more really awesome music hacks. Consider the hackathon to be the Haiku of programming. Instead of 17 syllables in 3 lines, a hacker has 24 hours. (Maybe we should call them Haikuthons;) I think the 24 hour constraint contributes to the creativity of the event.
Here are some of my favorite hacks built at recent Music Hack Days. Plenty of whimsy but no nonsense here:
- Drinkify – Answers the question “I’m listening to X, what should I drink?
- Invisible Instruments – Just what it says, musical instruments that you can’t see
- Bohemian Rhapsichord – Turns Queen’s Opus into a musical instrument
- Musaic – Discover music through photomoasics
- MIDEM Music Machine – a beautiful visualization of a song
- Tourrent Plans – Plan your tour based on where all the torrent downloaders are
- Stringer – a virtual string instrument
- The Swinger – Makes any song swing
Billboard wins!
Yep, the numbers are in. Out of 13 Grammy award, Billboard picked 7 correctly, while my web crawling approach picked 6. Congrats to the a Billboard editorial team for winning (this round!). Let me know where to send the milkshakes!
Here are the details. All the raw data is at Paul vs. Billboard.
| Category | Paul’s prediction | Billboard’s prediction | Actual Grammy | Who was right? |
| Album Of The Year | Adele | Adele | Adele | Both |
| Record Of The Year | Adele | Adele | Adele | Both |
| Song of the Year | Adele | Bruno Mars | Adele | Paul |
| Best New Artist | Bon Iver | The Band Perry | Bon Hiver | Paul |
| Best Pop Solo Performance | Adele | Lady Gaga | Adele | Paul |
| Best Pop Duo/Group Performance | Maroon 5 and Christina Aguilera | Tony Bennett | Tony Bennet | Billboard |
| Best R&B Album | Kelly Price | Chris Brown | Chris Brown | Billboard |
| Best Country Album | Jason Aldean | Taylor Swift | Lady Antebellum | Neither |
| Best Dance/Electronica Album | Cut/Copy | Skrillex | Skrillex | Billboard |
| Best Rock Album | Red Hot Chili Peppers | Foo Fighters | Foo Fighters | Billboard |
| Best Alternative Music Album | Bon Iver | Bon Iver | Bon Hiver | Both |
| Best Latin Pop, Rock or Urban Album | Calle 13 | Calle 13 | Maná | Neither |
| Best Rap Album | Kanye West & Jay-Z | Nicki Minaj | Kanye West | Neither |
Paul vs. Billboard
Posted by Paul in code, data, fun, The Echo Nest on February 12, 2012
Another weekend, another Music Hack Day. This weekend I’m at Tokbox headquarters in San Francisco at the 3rd annual Music Hack Day San Francisco, where 200 music hackers are building the future of music.
For my hack, I thought I would try to predict who would win the Grammy awards (the annual music awards presented by The Recording Academy) which is being held this evening. To do this, I used the Echo Nest APIs to gather of lots of news and blog posts for each nominated artist. I then peered into the articles looking for mentions of the Grammy nominated items. I tallied up the mentions and combined this with the overall artist hotttnesss to give me a ranked order of each nominated item, which I could then use to create my prediction.
Since Billboard has also made some Grammy predictions, I thought it’d be interesting to do a post-facto comparison on how well each of us predicts the winners – thus the hack title ‘Paul vs. Billboard’.
The hack is online here: Paul vs. Billboard
Be sure to check out all of the other music hacks being created this weekend:
Building a Seatwave + Echo Nest App
Posted by Paul in code, data, Music, The Echo Nest on February 8, 2012
This weekend at Music Hack Day SF, Seatwave is launching their Ticketing and Event API. This API will make it easy for developers to add event discovery and ticket-buying functionality to their apps. At the Echo Nest we’ve incorporated Seatwave artist IDs into our Rosetta ID mapping layer making it possible to use Seatwave IDs directly with the Echo Nest API. This makes it easier for you to use the Seatwave and the Echo Nest APIs together. For instance, you can call the Seatwave API, get artist event IDs in response and use those IDs with the Echo Nest API to get more context about the artist.
For example, we can make a call to the Seatwave API to get the set of Featured Contest with an API call:
The results include blocks of events like this:
{
“CategoryId”: 12,
“Currency”: “GBP”,
“Id”: 934,
“ImageURL”: “http://cdn2.seatwave.com/filestore/season/image/thestoneroses_934_1_1_20111018165906.jpg”,
“MinPrice”: 95,
“Name”: “The Stone Roses”,
“SwURL”: “http://www.seatwave.com/the-stone-roses-tickets/season”,
“TicketCount”: 1810
},
{
“CategoryId”: 10,
“Currency”: “GBP”,
“Id”: 702,
“ImageURL”: “http://cdn2.seatwave.com/filestore/season/image/redhotchilipeppers_702_1_1_20110617124457.jpg”,
“MinPrice”: 45,
“Name”: “Red Hot Chili Peppers”,
“SwURL”: “http://www.seatwave.com/red-hot-chilli-peppers-tickets/season”,
“TicketCount”: 1134
},
We see events for the Stone Roses and for RHCP. The Seatwave ID for RHCP is 702. We can use this ID directly with in Echo Nest calls. For instance, to get lots of Echo Nest info on the RHCP using the Seatwave ID, we can make an artist/profile call like so:
To show off the integration of Seatwave and Echo Nest, I’ve built a little web app that shows a list of top Seatwave concerts (generated via the Seatwave API). For each artist, the app shows the number of tickets available, the artist’s biography, along with a play button that will let you listen to a sample of the artist (via 7Digital).
The application is live here: Listen to Top Seatwave Artists. The code is on github: plamere/SWDemo
The Seatwave API is quite easy to work with. They support JSON, JSONP, XML and SOAP(bleh). Lots of good data, very nice artist images, generous affiliate program, easy to understand TOS. Highly recommended. See the Seatwave page in The Echo Nest Developer Center for more info on the Seatwave / Echo Nest integration.
The Midem Music Machine
Just a quick post before it is demo time. This weekend at MIDEM Hack Day, I teamed up this weekend with the famous Mr. Doob to build a music hack. We created the Midem Music Machine. It creates a beautiful visualization of music using The Echo Nest analyzer and Three.js. Here’s a pic:
As you can see, our hack was inspired by the Animusic folks. Working with Mr. Doob was awesome. He did just amazing stuff.
You can see the Midem Music Machine online here: Midem Music Machine. You’ll need a browser that supports WebGL like Chrome.
The clean desk award
We’ve doubled our floor space here at the Echo Nest. I now have an office with a door and a window that opens. Look at that desk. I’ll be winning the clean desk award every day for the next week at least!
Who is the A$%#hole?
In his blog post Can we kill the music business too? James from songspin.fm has the magic formula to kill the major labels. He says:
In a nutshell, to kill the major label run music industry, startups will need to:
- find great music from people who aren’t assholes
- let people do cool things with that music
- let users share what they create
- profit!
(Note that in that last quote, the first ‘sue’ link points to Grooveshark)
We are assigned a predetermined amount of weekly uploads to the system and get a small extra bonus if we manage to go above that (not easy).The assignments are assumed as direct orders from the top to the bottom, we don’t just volunteer to “enhance” the Grooveshark database.
All search results are monitored and when something is tagged as “not available”, it get’s queued up to our lists for upload. You have to visualize the database in two general sections: “known” stuff and “undiscovered/indie/underground”. The “known” stuff is taken care internally by uploads. Only for the “undiscovered” stuff are the users involved as explained in some posts above. Practically speaking, there is not much need for users to upload a major label album since we already take care of this on a daily basis.
Are the above legal, or ethical? Of course not. Don’t reply to give me a lecture. I know. But if the labels and their lawyers can’t figure out how to stop it, then I don’t feel bad for having a job. It’s tough times.
Why am I disclosing all this? Well, I have been here a while and I don’t like the attitude that the administration has acquired against the artists. They are the enemy. They are the threat. The things that are said internally about them would make you very very angry. Interns are promised getting a foot in the music industry, only to hear these people cursing and bad mouthing the whole industry all day long, to the point where you wonder what would happen if Grooveshark get’s hacked by Anonymous one day and all the emails leak on some torrent or something.
James may be right – that a big part of the future of music is letting developers do cool things with music, but holding up Grooveshark as an example of a music startup is a mistake. What Grooveshark is doing isn’t cool. It isn’t something that developers should emulate. James called those that sue Grooveshark assholes, but from my vantage point he got it exactly ass-backwards.







