I’ve been pretty busy figuring out the lay of the land at the new job, so I haven’t had too much time for recreational programming. However, last night, while my lovely wife was watching Dr. House demonstrate his excellent interpersonal skills, I got a chance to write a little bit of code to generate an artist graph using the Echo Nest Developer API.
The idea is to generate a graph that shows the artist similarity space in a fashion that can encourage exploration of the artist space. To do this, I simply use the Echo Nest get_similar call to walk the artist graph. Instead of getting bogged down in some graphics library to create the visualization, I just output ‘.DOT’ commands and render the whole thing using graphviz. Graphviz does all the hard work figuring out how to layout the graph. Here’s a tiny example of some graphviz output:
One of the problems with making these sort of graphs is that they can get extremely complicated, very quickly. Even after just a few steps away from the seed artist in the crawl of the artist graph there can be 100s of artists and 1000s of connections. Without some care, the graph quickly turns into an unreadable tangle. However, since we want to use these graphs for exploration of the artist space we can make a simplification that eliminates much of the complexity. For exploration, people tend to start from a known artist, and then proceed to lesser known artists. If we make our graph work in the same way, we will eliminate a large number of extraneous connections. Instead of connecting all artists that we encounter in our crawl of the artist graph, we only connect new artists to more popular artists that are already in the graph. This gives us an easy to manage directed, acyclic graph that flows from very familiar artists to unknown artists.
The pseudocode to do this is very simple:
add a seed artist to the work queue while the work queue is not empty curArtist <= the next artist from the queue for each artist similar to curArtist if similar artist less familiar than curArtist plot link to similar artist add similar artist to workqueue
The real java code is not much more complicated:
while (workQueue.size() > 0) { Artist artist = workQueue.remove(0); List<Scored<Artist>> simArtists = echoNest.getSimilarArtists(artist, 0, 6); float familiarity = echoNest.getFamiliarity(artist); for (Scored<Artist> scoredArtist : simArtists) { Artist similarArtist = scoredArtist.getItem(); float simFamiliarity = echoNest.getFamiliarity(similarArtist); if (simFamiliarity < familiarity) { out.printf("\"%s\" -> \"%s\";\n", artist.getId(), similarArtist.getId()); if (!plottedSet.contains(similarArtist)) { workQueue.add(similarArtist); plottedSet.add(similarArtist); out.printf("\"%s\" [label=\"%s\"]\n", similarArtist.getId(), similarArtist.getName()); } } } }
This yields some fun graphs. Here’s a detail from a graph created using Led Zeppelin as the see artist:
And the full graph in all its glory is here:
I can think of all sorts of things to add to this artist graph. We could size the nodes based upon the familiarity of the artist. We could color the artists based upon how ‘hot‘ the artist is. We could replace the graphviz with a real graphing library like prefuse and make the whole graph interactive – so you could actively explore the artist space, click on a node, read reviews about the artist, listen to their music, watch their videos.
Astute readers may have noticed that I’m making calls using an EchoNest library. That’s one of the things I’ve been working on in the last week – building a Java client library for the EchoNest developer API. I’ll be releasing this soon, once I figure out the best way to release an open source client library here at The Echo Nest. I should hopefully get something released by the end of this week. If you are interested in a sneak preview of the Java client library, let me know.
#1 by Stephen Green on February 26, 2009 - 11:30 am
Looks like your lines got truncated in the Java source, but I’m glad that you’re already up to Artist and Scored at the new gig :-)
#2 by plamere on February 26, 2009 - 11:33 am
@stephen – if I don’t start a week typing “public class Artist {” it is not a good week. And it is very nice to have a Scored that doesn’t have all that DocumentVector cruft that snuck in there, dirtying it up.
#3 by Sten Anderson on February 26, 2009 - 12:17 pm
This looks really neat, Paul. I eagerly await the Java API…
#4 by Jean-Francois Im on February 27, 2009 - 11:39 am
What metric is used as a similarity metric? I’m curious because the upper branch clearly lists a lot of pseudonyms from Konami’s series of music videogames(http://en.wikipedia.org/wiki/List_of_Bemani_musicians).