Posts Tagged networkx
To make the connections between the artists I rely on the relation data from MusicBrainz. MusicBrainz has lots of deep data about how various artists are connected. For instance there are about 130,000 artist-to-artist connections – connections such as:
- member of band
- is person
- personal relationship
- involved with
- supporting musician
- vocal supporting musician
- instrumental supporting musician
So from this data we know that George Harrison and Paul McCartney are related because each was a ‘member of the band’ of The Beatles. In addition to the artist-to-artist data MusicBrainz has artist-track relations (Eric Clapton played on ‘While My Guitar Gently Weeps’), artist-album (Brian Eno produced U2′s Joshua Tree), track-track (Girl Talk samples ‘Rock You Like A Hurricane’ by the Scorpions for the track ‘Girl Talk Is Here’). All told there are about 130 different types of relations that can connect two artists.
Not all of these relationships are equally important. Two artists that are members of the same band have a much stronger relationship than an artist that covers another artist. To accommodate this I assign weights to the various different types of relationships – this was perhaps the most tedious and subjective part of building this app.
Once I have all the different types of relations I created a directed graph connecting all of the artists based upon these weighted relationships. The resulting graph has 220K artists connected by over a million edges. Finding a path between a pair of artists is a simple matter of finding the shortest weighted path through the graph.
We can learn a little bit about music by looking at some of the properties of the graph. First of all, the average distance in the graph between any two artists in the graph chosen at random is 7. Some of the top most connected artists along with the number of connections:
Here we see some of the anomalies in the connection data - any classical performer who performs a piece by Mozart is connected to Mozart – thus the high connectivity counts for classical composers. A more interesting metric is the ‘betweeness centrality’ – artists that occur on many shortest paths between other artists have higher betweenness than those that do not. Artists with high betweenness centrality are the connecting fibers of the music space. Here are the top connecting artists:
I had never heard of Pigface before I started this project – and was doubtful that they could really be such a connecting node in the world of music – but a look a their wikipedia page makes it instantly clear why they are such a central node – they’ve had well over a hundred members in the band over their history. Black Sabbath, while not at the top of the list is still extremely well connected.
I wrote the app in python, relying on networkx for the graph building and path finding. The system performs well, even surviving an appearance on the front page of Reddit. It was a fun app to write – and I enjoy seeing all the interesting pathways people have found through the artist space.