[tweetmeme source= ‘plamere’ only_single=false] My hack at last week’s Music Hack Day San Francisco was Six Degrees of Black Sabbath – a web app that lets you find connections between artists based on a wide range of artist relations. It is like The Oracle of Bacon for music.
To make the connections between the artists I rely on the relation data from MusicBrainz. MusicBrainz has lots of deep data about how various artists are connected. For instance there are about 130,000 artist-to-artist connections – connections such as:
- member of band
- is person
- personal relationship
- involved with
- supporting musician
- vocal supporting musician
- instrumental supporting musician
So from this data we know that George Harrison and Paul McCartney are related because each was a ‘member of the band’ of The Beatles. In addition to the artist-to-artist data MusicBrainz has artist-track relations (Eric Clapton played on ‘While My Guitar Gently Weeps’), artist-album (Brian Eno produced U2’s Joshua Tree), track-track (Girl Talk samples ‘Rock You Like A Hurricane’ by the Scorpions for the track ‘Girl Talk Is Here’). All told there are about 130 different types of relations that can connect two artists.
Not all of these relationships are equally important. Two artists that are members of the same band have a much stronger relationship than an artist that covers another artist. To accommodate this I assign weights to the various different types of relationships – this was perhaps the most tedious and subjective part of building this app.
Once I have all the different types of relations I created a directed graph connecting all of the artists based upon these weighted relationships. The resulting graph has 220K artists connected by over a million edges. Finding a path between a pair of artists is a simple matter of finding the shortest weighted path through the graph.
We can learn a little bit about music by looking at some of the properties of the graph. First of all, the average distance in the graph between any two artists in the graph chosen at random is 7. Some of the top most connected artists along with the number of connections:
- 5372 Various Artists
- 1604 Wolfgang Amadeus Mozart
- 1275 Johann Sebastian Bach
- 905 Ludwig van Beethoven
- 696 Linda Ronstadt
- 611 Diana Ross
- 560 [traditional]
- 538 Antonio Vivaldi
- 534 Jay-Z
- 528 Georg Friedrich Händel
- 494 Giuseppe Verdi
- 491 Johannes Brahms
- 490 Bob Dylan
- 465 The Beatles
- 442 Aaron Neville
Here we see some of the anomalies in the connection data – any classical performer who performs a piece by Mozart is connected to Mozart – thus the high connectivity counts for classical composers. A more interesting metric is the ‘betweeness centrality’ – artists that occur on many shortest paths between other artists have higher betweenness than those that do not. Artists with high betweenness centrality are the connecting fibers of the music space. Here are the top connecting artists:
- 565 Pigface
- 312 Various Artists
- 135 Mick Harris
- 122 Black Sabbath
- 120 The The
- 115 Youth
- 93 Bill Laswell
- 79 J.G. Thirlwell
- 74 Painkiller
- 72 F.M. Einheit
- 71 Napalm Death
- 63 Paul McCartney
- 63 Flea
- 60 Material
- 60 Andrew Lloyd Webber
- 57 Luciano Pavarotti
- 57 Raimonds Macats
- 56 Ginger Baker
- 56 Mike Patton
- 54 Johnny Marr
- 54 Paul Raven
- 53 Brian Eno
I had never heard of Pigface before I started this project – and was doubtful that they could really be such a connecting node in the world of music – but a look a their wikipedia page makes it instantly clear why they are such a central node – they’ve had well over a hundred members in the band over their history. Black Sabbath, while not at the top of the list is still extremely well connected.
I wrote the app in python, relying on networkx for the graph building and path finding. The system performs well, even surviving an appearance on the front page of Reddit. It was a fun app to write – and I enjoy seeing all the interesting pathways people have found through the artist space.
#1 by Mo on May 21, 2010 - 10:00 am
Thanks for this fun tool!
A friendof mine found this apparent oddity, which may be because we misunderstand the process, or some other explanation — see what you think:
Mika to Mike Batt, 26 steps.
Mika to The Wombles, 19 steps — of which Mike Batt is the 18th.
#2 by Paul on May 21, 2010 - 12:21 pm
The paths are weighted based on the quality of the relationships. This means that a shorter path may be worse than a longer path. But this example that you give puzzle me- so maybe there’s a bug. I shall take a look. Thanks!
#3 by Matt on May 21, 2010 - 1:44 pm
This is nifty!
The one oddity I’ve found is that it doesn’t have a way of distinguishing between the band “Rush” and “The Rush”
So, starting at Rush gives no-path
Whereas starting with Geddy Lee gives a path
#4 by Paul on May 21, 2010 - 3:31 pm
hmmm … looks like a bug, for ambiguous artists it should chose the more popular artist, But it doesn’t look like it is doing that. Will check.
#5 by Paul on May 21, 2010 - 4:38 pm
Fixed this bug. Thanks! Paul
#6 by Bob Harvey on May 21, 2010 - 6:22 pm
This tool has been the subject of a thread at uk.rec.sheds (thread title Motorhead to Chris de Burgh in 8 steps )
I described it as a game of Mornington Crescent for Musicos.
Thanks for all the fun. I’ve learned a lot, not least that there is someone called Glenn Miller that I was not expecting…
My most unexpected was Charlotte Church to Carlos Santana in 2-and-a half.
Well done again.
#7 by Phil on May 23, 2010 - 6:59 am
Fun tool to play around with
Noticed some glitches that I guess may be with the data from musicbrainz
For example on the page for Bob James, it says he composed the track Clap Your Hands by A Tribe Called Quest, which I don’t think is quite right, I think they sampled him. Then it gets the sampled tracks the wrong way round: created the track Eple which provides samples for the track You’re as Right as Rain by Röyksopp – its the other way round.
Actually just checked the musicbrainz site and they have the tracks the right way round, so I guess its your app :-(
Anyway, thanks for sharing
#8 by Bill Mayan on May 23, 2010 - 10:56 pm
Nicely done, a fun tool. Black Sabbath and Jay-Z?
#9 by Alex Dupuy on May 25, 2010 - 11:37 am
> We can learn a little bit about music by looking at some of
> the properties of the graph. First of all, the average
> distance in the graph between any two artists in the graph
> chosen at random is 7.
I just did a search (Wire to Black 47) which gave me “no path”. I’m curious if you gathered any statistics on the number of non-trivial isolated enclaves (groups of artists connected to at least one other artist, but not connected to the largest group (which we could define as “connected to Pigface”).
Also, it seems to me that using “Various Artists” or “[traditional]” as one of the steps is a bit of a cheat. While two artists both appearing on a particular VA album is a reasonable step, the fact that two artists both appear on (different) albums by the artist “Various Artists” really doesn’t mean anything (much).
#10 by Paul Babiak on May 28, 2010 - 11:00 pm
Had a lot of fun with this site.
But I found some very convoluted paths for Aynsley Dunbar to various members of the Mothers of Invention, of which he was also a member. But there were so many incarnations of the Mothers, I suppose it’s difficult to get them all.
#11 by Anthony on June 1, 2010 - 5:08 am
this is fun. most steps I’ve found so far was 20, between Kristina Olsen and The Waifs, who I have seen perform consecutively on the same stage. which is kind of fun.
I’ve found a little anomaly for you. Doc Neeson who is/was the lead singer of the Aussie band The Angels, is almost certainly not connected to the girl group Angels this search found – unless he’s had some cosmetic and other surgery recently!
#12 by Anthony on June 1, 2010 - 5:14 am
Here’s another odd one:
guy sebastian covered three otis redding songs on his memphis album, but you’ve got nine steps between them.
#13 by media on June 9, 2010 - 2:58 pm
Stayin’ alive, stayin’ alive.
Ah, ha, ha, ha, stayin’ alive, stayin’ alive.
Ah, ha, ha, ha, stayin’ alive.
#14 by dj empirical on June 13, 2010 - 4:55 pm
looks like it’s having trouble finding Jandek.
#15 by tasso on June 20, 2010 - 11:18 am
Hi, very cool application! How did you stored the graph? Did you considered to use Neo4j (http://neo4j.org/)?
#16 by letterstothelabel on June 20, 2010 - 10:06 pm
I liked this better the first time when it was called:
#17 by Samir on July 9, 2010 - 9:21 pm
How can I contribute with you sending informations of some artists?