The Led Zeppelin Graph

I’ve been pretty busy figuring out the lay of the land at the new job, so I haven’t had too much time for recreational programming.  However, last night, while my lovely wife was watching Dr. House demonstrate his excellent interpersonal skills,  I got a chance to write a little bit of code to generate an artist graph using the Echo Nest Developer API.

The idea is to generate a graph that shows the artist similarity space in a fashion that can encourage exploration of the artist space.   To do this, I simply use the Echo Nest get_similar call to walk the artist graph.  Instead of getting bogged down in some graphics library to create the visualization,   I just output ‘.DOT’ commands and render the whole thing using graphviz.  Graphviz does all the hard work  figuring out how to layout the graph. Here’s a tiny example of some graphviz output:

artist-treeOne of the problems with making these sort of graphs is that they can get extremely complicated, very quickly. Even after just a few steps away from the seed artist in the crawl of the artist graph there can be 100s of artists and 1000s of connections.  Without some care, the graph quickly turns into an unreadable tangle.  However,  since we want to use these graphs for exploration of the artist space we can make a simplification that eliminates much of the complexity.  For exploration, people tend to start from a known artist, and then proceed to lesser known artists. If we make our graph work in the same way, we will eliminate a large number of extraneous connections.  Instead of connecting all artists that we encounter in our crawl of the artist graph, we only connect new artists to more popular artists that are already in the graph.  This gives us an easy to manage directed, acyclic graph that flows from very familiar artists to unknown artists.

The pseudocode to do this is very simple:

  add a seed artist to the work queue
  while the work queue is not empty
      curArtist <=  the next artist from the queue
      for each artist similar to curArtist
          if similar artist less familiar than curArtist
               plot link to similar artist
               add similar artist to workqueue

The real java code is not much more complicated:

while (workQueue.size() > 0) {
  Artist artist = workQueue.remove(0);
  List<Scored<Artist>> simArtists = echoNest.getSimilarArtists(artist, 0, 6);
  float familiarity = echoNest.getFamiliarity(artist);
  for (Scored<Artist> scoredArtist : simArtists) {
    Artist similarArtist = scoredArtist.getItem();
    float simFamiliarity = echoNest.getFamiliarity(similarArtist);
    if (simFamiliarity < familiarity) {
       out.printf("\"%s\" -> \"%s\";\n", artist.getId(), similarArtist.getId());
       if (!plottedSet.contains(similarArtist)) {
          workQueue.add(similarArtist);
          plottedSet.add(similarArtist);
          out.printf("\"%s\" [label=\"%s\"]\n", similarArtist.getId(), similarArtist.getName());
        }
    }
  }
}

This yields some fun graphs. Here’s a detail from a graph created using Led Zeppelin as the see artist:

Detail of the Led Zeppelin artist graph

And the full graph in all its glory is here:

Full plot (click to see it full size)

Full plot (click to see it full size)

I can think of all sorts of things to add to this artist graph.  We could size the nodes based upon the familiarity of the artist.  We could color the artists based upon how ‘hot‘ the artist is.  We could replace the graphviz with a real graphing library like prefuse and make the whole graph interactive – so you could actively explore the artist space, click on a node, read reviews about the artist, listen to their music, watch their videos.

Astute readers may have noticed that I’m making calls using an EchoNest library.  That’s one of the things I’ve been working on in the last week – building a Java client library for the EchoNest developer API.   I’ll be releasing this soon, once I figure out the best way to release an open source client library here at The Echo Nest.  I should hopefully get something released by the end of this week.    If you are interested in a sneak preview of the Java client library, let me know.

  1. #1 by Stephen Green on February 26, 2009 - 11:30 am

    Looks like your lines got truncated in the Java source, but I’m glad that you’re already up to Artist and Scored at the new gig :-)

  2. #2 by plamere on February 26, 2009 - 11:33 am

    @stephen – if I don’t start a week typing “public class Artist {” it is not a good week. And it is very nice to have a Scored that doesn’t have all that DocumentVector cruft that snuck in there, dirtying it up.

  3. #3 by Sten Anderson on February 26, 2009 - 12:17 pm

    This looks really neat, Paul. I eagerly await the Java API…

  4. #4 by Jean-Francois Im on February 27, 2009 - 11:39 am

    What metric is used as a similarity metric? I’m curious because the upper branch clearly lists a lot of pseudonyms from Konami’s series of music videogames(http://en.wikipedia.org/wiki/List_of_Bemani_musicians).

  1. Halvard Halvorsen’s tumblelog » The Led Zeppelin Graph « Music Machinery
  2. Similarity Graphs « MobBlog
  3. The HADOOP-1722 graph « Dumbotics
%d bloggers like this: