Archive for category fun

The Sinister Index

cheetosLike many, I like to eat Cheetos, weezer when I’m relaxing and browsing the web, especially when I’m looking for new music.  The problem is that Cheetos leaves this nasty residue on the fingers which gets transferred to the keyboard rather quickly. To avoid this problem I like to use my keyboard one handed (I know what you are thinking, but really, its the Cheetos).   Which is why one of my favorite bands is Weezer. I can type ‘weezer’ with my left hand leaving my right hand free for Cheetos, and leaving my keyboard clean.   Still, I was in the mood for music by some other bands so I thought it would be interesting to find all of the bands that can be typed using just my left hand.  I wrote a Python script, ran it on a list of about 800 thousand artist names and came up with a rather large list of sinister band names.  Here are the longest:

der weg des wassers
everette red bear
sweet ever after
state far better
cassettezzzzzzzz
barbara decesare
westgate street
streetbeat crew
street bastards
reve de cabaret
rebecca everett
cabezas de cera
barbara taggart
warsaw was raw

Restricting the search to just the more popular artists I find this list of popular sinister artists:

wet wet wet
savatage
bee gees
seabear
garbage
cascada
carcass
caesars
weezer
feeder
vader
texas
sweet
stars
seeed
eve 6
dredg
creed
vast
sade
free
bebe
abba
xtc
war
rza
eve
era
d12
atb
afx
abc
311
112
bt

To be evenhanded, I offer this list of dexterous artists:

phillip moll
phillip hill
yumiko ohno
yoon il-loh
uh uh loony
polmo polpo
pinko pinko
opi yum yum
oli oli oli
oh no oh my
nylon union
nylon pylon
monki monki
homo homini
yuko kouno
yuho yokoi

And a list of popular dexterous artists:

yoko ono
moloko
pulp
pink
mylo
mono
koop
mum
iio
him
l7

There seem to be many more sinister artists than dexterous artists. I suspect  that this is because many artists now recognize the Cheetos issue and  are selecting sinister names.  Since identifying sinister artists is  becoming such a big issue in music search, we will likely be offering a sinister index as part of  The Echo Nest web services. The sinister index is a number between zero and one that indicates how easy it is to type the artist name with your left hand.  Weezer has a sinister index of 1 while Yoko Ono has a sinister index of zero. Look for it soon.

, , ,

2 Comments

The Coolness Index

Some artists just are not cool – your mom likes ABBA, so there’s no way you are going to listen to them, even if you think Mamma Mia is rather catchy.  Likewise  you may think High School Musical’s ‘Bop to the top’ is mucho gusto, but you don’t want anyone to know it.  Coolness is hard to quantify, ephemeral and transient (and of course, very subjective); some artists like Miles Davis and the Velvet Underground will always be totally cool – while some fade in and out of coolness (Elvis, Stevie Wonder,  Neil Diamond, Sting), and some artists – well, it is hard to tell if they were ever cool (Miley Cyrus, Creed, and Nickeback come to mind).

Imagine if there was an objective measure for coolness  – a number that could be attached to each artist that indicated how ‘cool’ the artist was.   We’d be able to do all sorts of interesting things with such a ‘coolness index’.  We could make a ‘music makeover’ playlist that would take you from Miley to Miles in 12 songs  (consider it a 12-step  taste recovery program) or we could create a music rehab playlist that takes you  from Amy Winehouse to Kate Nash.  But of course, the concept of cool  is too hard to nail down.  Is Johnny Cash cool? Michael Jackson? Prince?  Context, demographics, locale  all play a role.

It may be too hard to tell whether an artist is cool, but we have all sorts of ways to tell that an artist is definitely not cool.  For instance, if lots of listeners  really don’t want people to know that they are listening to a particular artist, then that artist is probably not too cool. Luckily, there’s an interesting source for just this kind of data.   Recently, the researchers at Last.fm published a list of the ‘most unwanted scrobbles‘.  This is a list of  tracks that were most frequently deleted by the Last.fm community from their scrobbles in the last month. These are the tracks that Last.fm listeners didn’t want people to know that the listened to.  Here’s the first page of the most unwanted scrobbles:

notcoolKudos to Last.fm for publishing this data. It’s a great source for the uncool.  Collecting all the artists from the pages we can build a list of artists that have frequently had their scrobbles deleted:

Lady GaGa
Britney Spears
Katy Perry
Rihanna
Paramore
Coldplay
Taylor Swift
Beyoncé
Avril Lavigne
Marc Seales, composer. New Stories. Ernie Watts, saxophone
Alexander Rybak
Black Eyed Peas
Kings of Leon
Muse
My Chemical Romance
Linkin Park
Korn
Miley Cyrus
Jason Mraz
Metro Station
Leona Lewis
Green Day
Evanescence
Amy Whinehouse
Oasis
Nelly Furtado

This list rings true as set of ‘uncool’ artists (with the exception Marc Seales, who happens to have a  piece of  music, called ‘Highway Blues’,  that can be found in most ‘Sample Music’ folders on most Windows XP computers, and is likely  frequently scrobbled because of this).  Ideally this list should be normalized for popularity – naturally artists that have more listeners will be scrobbled more and consequently be deleted more too. but there’s not enough data in this list to normalize properly so we’ll make do with an unnormalize list.  I find it  interesting how many female acts are on the list. Is it not cool to listen to female artists?

Another approach to find the uncool  is to look for artists that have been tagged as ‘guilty pleasure’ on sites like Last.fm.  For these artists,  by applying the ‘guilty pleasure’ tag people are identifying artists that they are embarrassed to be listening to.  Here’s a list of the top 100 popular artists that have been frequently tagged with ‘guilty pleasure’ – for this list I’m normalizing the data so popularity doesn’t factor into the list order:

Katy Perry
Ashlee Simpson
Spice Girls
Lindsay Lohan
Mandy Moore
Jessica Simpson
Backstreet Boys
Hilary Duff
Metro Station
Britney Spears
Justin Timberlake
Taylor Swift
Rihanna
The Pussycat Dolls
Kelly Clarkson
Christina Aguilera
Fall Out Boy
Take That
Avril Lavigne
Ricky Martin
Girls Aloud
Fergie
Neil Diamond
McFly
Robyn
The Veronicas
Ace of Base
ABBA
Cline Dion
Chris Brown
All Time Low
Kanye West
Gwen Stefani
Good Charlotte
P!nk
Usher
blink-182
R. Kelly
Nelly Furtado
The Get Up Kids
Madonna
Timbaland
Beyonce
New Found Glory
Natasha Bedingfield
Akon
Jem
Ciara
Robbie Williams
Paramore
The Wallflowers
Michelle Branch
Taking Back Sunday
Creed
Savage Garden
The All-American Rejects
Simple Plan
Shania Twain
Sugababes
Tegan and Sara
Everclear
Sugarcult
The Starting Line
Brand New
Destiny’s Child
Cyndi Lauper
Mariah Carey
Westlife
Maroon 5
Melanie C
Jennifer Lopez
Michael Jackson
Kelis
Tears for Fears
Alkaline Trio
Dashboard Confessional
Vanessa Carlton
Lily Allen
Bowling for Soup
Jet
50 Cent
Trivium
Cher
Eve 6
Sean Paul
Kylie Minogue
Howie Day
Sophie Ellis-Bextor
My Chemical Romance
Third Eye Blind
Saves the Day
Bryan Adams
Blondie
Boston
John Mellencamp
Simply Red
Whitney Houston
The Corrs
The Calling
Motion City Soundtrack

There’s overlap between the two lists:  Avril, Britney, Katy, Nelly, Taylor, Rihanna, along with the Disney crowd. Again, there seems to be an anti-female coolness bias on the list. It is hard to be cool and female.

The ‘most unwanted scrobbles’ and the ‘guilty+pleasure’ approach to the coolness index only get us so far. They can help us identify music that people are embarrassed to admit that they enjoy.  But they only give us one end of the coolness spectrum.  We can find what is not cool, but we can’t find out what is cool.  We have in effect an ‘Uncoolness Index’.  Still, knowing which artists are uncool can be helpful for all sorts of things.   If we are building a playlist for that party, we can turn on the uncool filter to make sure that Ricky Martin or Robbie Williams won’t sneak into the mix.  Likewise, if we are building a recommender, we can use the Uncoolness index to decide how cool the user is and recommend music that’s slightly less uncool than what they are used to listening to.

Next steps are to figure out how to learn not just what is uncool, but also what is cool, so we can build the true ‘coolness index’ and be able to tell how cool any artist is.  I think that is going to be a harder problem, but I have some ideas …

, , , ,

42 Comments

The Dissociated Mixes

Check out Adam Lindsay’s latest post on Dissociated Mixes. He’s got a pretty good collection of automatically shuffled songs that sound interesting and eerily different from the original.  One example is this remixed audio/video of Beck’s Record Club cover of “Waiting for my Man” by The Velvet Underground and Nico:


, , ,

Leave a comment

Music HackDay is coming …

If you live within a couple hundred miles of London, and you read this blog, then there’s no reason why you shouldn’t be planning on going to Music Hackday being held on July 11th and 12th at the Guardian offices in London.   This is a great opportunity to connect with other developers that are creating next generation music applications, web sites, and gadgets.  In addition to the developers,  API providers will be showing off their wares (and some will even be unveiling new APIs).  Companies include 7digital, Gigulate, Last.fm, People’s Music Store, Songkick, Soundcloud and The Echo Nest.    Recently added to the agenda are workshops by  Tinker.it and RjDj.

The Echo Nest will be there, represented by Adam Lindsay. He’ll guide you through using our various APIs including our artist recommendation APIs and our music analysis and remix APIs.  Oh, and the developer that creates the coolest thing that uses the Echo Nest API will go home with a big, fat (i.e. 32gb) iPod touch.

Looking at the attendee list,  the Music Hackday looks to be a who’s who in music tech –  not only will it be a day of hacking, but it’s a great place to  get to meet all of the folks that are creating the next generation of music apps.  It looks like spaces are filling up quickly, so if you haven’t already registered, don’t dally, or you may miss out.

, ,

2 Comments

Where’s the Pow?

This morning, while eating my Father’s day bagel, I got to play some more with the video aspects of the Echo Nest remix API.  The video remix is pretty slick.  You use all of the tools that you use in the audio remix, except that the object you are manipulating has a video component as well.    This makes it easy to take an audio remix and turn it into a video remix.  For instance, here’s the remix code to create a remix that includes the first beat of every bar:

 audiofile = audio.LocalAudioFile(input_filename)
 collect = audio.AudioQuantumList()
 for bar in audiofile.analysis.bars:
     collect.append(bar.children()[0])
 out = audio.getpieces(audiofile, collect)
 out.encode(output_filename)

To turn this into a video remix, just change the code to:

 av = video.loadav(input_filename)
 collect = audio.AudioQuantumList()
 for bar in av.audio.analysis.bars:
     collect.append(bar.children()[0])
 out = video.getpieces(av, collect)
 out.save(output_filename)

The code is nearly identical, differing in loading and saving, while the core remix logic stays the same.

To make a remix of a YouTube video, you need to save a local copy of the video.   I’ve been using KeepVid to save local flv (flash video format) of any Youtube video.

Today I played with the track ‘Boom Boom Pow’ by the Black Eyed Peas.  It’s a fun song for remix because it has a very strong beat, and already has a remix feel to it.  And since the song is about digital transformation, it seems to be a good target for remix experiments.  (and just maybe they won’t mind the liberties I’ve taken with their song).

Here’s the original (click through to YouTube to watch it since embedding is not allowed):

Just Boom

The first remix is to only include the first beat of every measure.   The code is this:

    for bar in av.audio.analysis.bars:
         collect.append(bar.children()[0])

Just Pow

Change the beat included from beat zero to beat three, and we get something that sounds very different:

Pow Boom Boom

Here’s a version with the beats reversed.  The core logic for this transformation is one line of code:

av.audio.analysis.beats.reverse()

The 5/4 Version

Here’s a version that’s in 5/4 – to make this remix I duplicated the first beat and swapped beats 2 and 3.  This is my favorite of the bunch.

These transformations are of the simplest variety, taking just a couple of minutes to code and try out.   I’m sure some budding computational remixologist could do some really interesting things with this API.

Note that the latest video support is not in the main branch of remix.  If you want to try some of this out you’ll need to check out the bl-video branch from the svn repository.     But this is guaranteed to be rolled into the main branch before the upcoming Music Hackday. Update: the latest video support is now part of the main branch.  If you want to try it out, check it out from the trunk of the SVN repository. So download the code, grab your API key and start remixing.

Update: As Brian pointed out in the comments there was some blocking on the remix renders. This has been fixed, so if you grab the latest code, the video output quality is as good as the input.

, , , ,

14 Comments

More confusing than Memento

Ben Lacker, one of our leading computational remixologists here at the Echo Nest has been improving the video remix capabilities of the Echo Nest remix API.   On Friday, he remixed this mind blower.  It’s Coldplay’s music video for ‘The Scientist’ – beat reversed, which means that song is played in reverse order beat by beat (but each  beat is still played in forward order).    Since Coldplay’s video is already shot in reverse order, the resulting video has a story that unfolds in proper chronological order, but where every second of video runs backwards, while the music unfolds in reverse chronological order while every beat runs forward.  I get a little bit of a stomachache watching this video.

Ben has committed the code for this remix to the Echo Nest remix code samples so feel free to check it out and hack on it.    I hope to see some more interesting music and video remixes coming out of the upcoming Music Hackday.

, , , ,

5 Comments

The Passion Index

One of the ways that Music 2.0 has changed how we think about music is that there is so much interesting data available about how people are listening to music.  Sites like Last.fm automatically track all sorts of interesting data that just was not available before.  Forty years ago, a music label like Capitol would know how many copies the album  Abbey Road sold in the U.S., but the label wouldn’t know how many times people actually listened to the album.  Today, however, our iPods and desktop music players keep careful track of how many times we play each song,  album and artist – giving us a whole new way to look at artist popularity.  beatles-countIt’s not just sales figures anymore, its how often are people actually listening to an artist.  If you go to Last.fm you can see that The Beatles have over  1.75 million listeners and 168 million plays.  It makes it easy for us to see how popular the Beatles are compared to another band (the monkees, for instance have 2.5m plays and 285K listeners).

With all of this new data available, there are some new ways we can look at artists.  Instead of just looking at artists in terms of popularity and sales rank,  I think it is interesting to see which artists generate the most passionate listeners.  These are artists that dominate the playlists of their fans.   I think this ‘passion index’ may be an interesting metric to use to help people explore for and discovery music.  Artists that attract passionate fans may be longer lived and worth  a listeners investment in time and money.

How can we calculate a passion index?   There are probably a number of indicators:  the number of edits to the bands wikipedia page,  the average distance a fan travels to attend a show by the artist, the number of fan sites for an artist.  All of these may be a bit difficult to collect, especially for a large set of artists.  One  simple passion metric is just  the average number of artist plays per listener.  Presumably if an artist’s listeners are playing an artist’s songs more than average they are more passionate about the artist.   One thing that I like about this approach to the passion index is that it is extremely easy to calculate – just divide the total artist plays by the total number of artist listeners and you have the passion index.   Yes, there are many confounding factors – for instance,  artists with longer songs are penalized – still I think it is a pretty good measure.

I calculated the passion index for a large collection of artists.  I started with about a million artists (it is really nice to have all this data at the Echo Nest;), and filtered these down to the 50K most popular artists.  I plotted the number of artist plays vs. the number of artist listeners for each of the 50 K listeners.    The plot shows that most artists fall into the central band (normal passion), but some (the green points) are high passion artists and some (the blue points) are low passion artists.

passion

For the 50K artists, the average track plays per artist/listener is just 11 plays (with a std deviation of about 11.5).  Considering that there are a substantial number of artists in my iTunes collection that I’ve played only once, this seems pretty resaonable.

So who are the artists with the highest passion index?   Here are the top ten:

Passion Listeners Plays Artist
332 4065 1352719 上海アリス幻樂団
292 10374 3032373 Belo
245 3147 773959 Petos
241 2829 683191 Reilukerho
208 4887 1020538 Sound Horizon
190 24422 4652968 동방신기
185 9133 1691866 岡崎律子
175 9171 1611106 Kollegah
173 17279 3004410 Super Junior
170 62592 10662940 Böhse Onkelz

I didn’t recognize any of these artists (and I’m not even sure if 上海アリス幻樂団 is really an artist – according to the Japanese wikipedia it is a fan club in Japan belo.1to produce a music game coterie – whatever that means).   Belo is a Brazilian pop artist that does indeed seem to have some rather passionate fans.

It is not surprising that it is hard for popular artists to rank at the very top of the  passion index.  Popular artists are exposed to many, many listeners which can easily reduce the passion index.    Here are the top passion-ranked artists drawn from the top-1000 most popular artists:

Passion Listeners Plays Artist
115 527653 60978053 In Flames
95 1748159 167765187 The Beatles
79 2140659 170106143 Radiohead
78 282308 22071498 Die Ärzte
75 269052 20293399 Mindless Self Indulgence
75 691100 52217023 Nightwish
74 332658 24645786 Porcupine Tree
74 1056834 79135038 Nine Inch Nails
72 384574 27901385 Opeth
70 601587 42563097 Rise Against
69 357317 24911669 Sonata Arctica
69 1364096 95399150 Metallica
66 460518 30625121 Children of Bodom
66 619396 41440369 Paramore
65 504464 33271871 Dream Theater
65 1391809 90888046 Pink Floyd
64 540184 34635084 Brand New
62 862468 54094977 Iron Maiden
62 1681914 105935202 Muse
61 381942 23478290 Beirut

I find it interesting to see all of the heavy metal bands in the top 20. Metal fans are indeed true fans.

Going to the other end of passion, we find the 20 popular artists that have the least passionate fans:

Passion Listeners Plays Artist
6 270692 1767977 Julie London
6 284087 1964292 Smoke City
6 294100 1784358 Dinah Washington
6 295200 1799303 The Bangles
6 295990 1832771 Donna Summer
6 306018 1905285 Bonnie Tyler
6 307407 2123599 Buffalo Springfield
6 311543 2085085 Franz Schubert
6 312078 1909769 The Hollies
6 313732 2190008 Tom Jones
6 325454 2025366 Eric Prydz
6 331837 2259892 Sarah Vaughan
6 332072 2016898 Soft Cell
6 407622 2622570 Steppenwolf
5 275770 1605268 Diana Ross
5 281037 1615125 Isaac Hayes
5 282095 1685959 The Isley Brothers
5 283467 1666824 Survivor
5 311867 1694947 Peggy Lee
5 333437 1925611 Wham!
5 388183 2244878 Kool & The Gang

I guess people are not too passionate about Soft Cell.

Here’s a passion chart for the top 100 most popular artists. Even the artists at the bottom of this chart are way above average on the passion index.

Passion Listeners Plays Artist
95 1748159 167765187 The Beatles
79 2140659 170106143 Radiohead
74 1056834 79135038 Nine Inch Nails
69 1364096 95399150 Metallica
65 1391809 90888046 Pink Floyd
62 1681914 105935202 Muse
61 1397442 85685015 System of a Down
61 1403951 86849524 Linkin Park
60 1346298 81762621 Death Cab for Cutie
57 1060269 61127025 Fall Out Boy
56 1155877 65324424 Arctic Monkeys
55 1897332 104932225 Red Hot Chili Peppers
54 950416 52019102 My Chemical Romance
50 1131952 56622835 blink-182
49 2313815 115653456 Coldplay
48 964970 47102550 Sigur Rós
48 1108397 53260614 Modest Mouse
48 1350931 65865988 Placebo
47 1129004 53771343 Jack Johnson
44 1297020 57111763 Led Zeppelin
43 1011131 43930085 Kings of Leon
42 947904 39970477 Marilyn Manson
42 1065375 45459226 Britney Spears
42 1246213 52656343 Incubus
42 1256717 53610102 Bob Dylan
41 1527721 62654675 Green Day
41 1881718 78473290 The Killers
40 1023666 41288978 Queens of the Stone Age
40 1057539 42472755 Kanye West
40 1108044 44845176 Interpol
40 1247838 49914554 Depeche Mode
40 1318140 53594021 Bloc Party
39 1266502 49492511 The White Stripes
38 1048025 40174997 Evanescence
38 1091324 42195854 Pearl Jam
38 1734180 67541885 Nirvana
37 978342 36561552 The Kooks
37 1097968 41046538 The Shins
37 1114190 42051787 The Offspring
37 1379096 51313607 The Cure
37 1566660 58923515 Foo Fighters
36 1326946 48738588 The Smashing Pumpkins
35 1091278 39194471 Björk
35 1271334 45619688 The Strokes
34 955876 33376744 Jimmy Eat World
34 1251461 42949597 Daft Punk
33 989230 33257150 Pixies
33 1012060 34225186 Eminem
33 1051836 35529878 Avril Lavigne
33 1110087 36785736 Johnny Cash
33 1121138 37645208 AC/DC
33 1161536 38615571 Air
32 961327 31286528 The Prodigy
32 1038491 33270172 Amy Winehouse
32 1410438 45614720 David Bowie
32 1641475 52612972 Oasis
32 1693023 54971351 U2
31 1258854 39598249 Madonna
31 1622198 51669720 Queen
30 1032223 31750683 Portishead
30 1178755 35600916 Rage Against the Machine
30 1249417 38284572 The Doors
30 1393406 42717325 Beck
29 1030982 30044419 Yeah Yeah Yeahs
29 1187160 34712193 Massive Attack
29 1348662 39131095 Weezer
29 1361510 39753640 Snow Patrol
28 985715 28485679 The Postal Service
28 1045205 30105531 The Clash
28 1305984 37807059 Guns N’ Roses
28 1532003 43998517 Franz Ferdinand
27 1000950 27262441 Nickelback
27 1395278 37856776 Gorillaz
26 1503035 40161219 The Rolling Stones
25 1345571 33741254 R.E.M.
24 1311410 32588864 Moby
23 973319 22962953 Audioslave
23 976745 22557111 3 Doors Down
23 1123549 26696878 Keane
22 998933 21995497 Justin Timberlake
22 1025990 23145062 Rihanna
22 1109529 24687603 Maroon 5
22 1120968 24796436 Jimi Hendrix
22 1160410 26641513 [unknown]
21 1151225 25081110 The Who
20 1057288 22084785 The Chemical Brothers
20 1105159 22925198 Kaiser Chiefs
20 1117306 22390847 Nelly Furtado
20 1201937 25019675 Aerosmith
20 1253613 25582503 Blur
19 968885 19219364 Simon & Garfunkel
19 974687 18528890 Christina Aguilera
19 1025305 20157209 The Cranberries
19 1144816 22252304 Michael Jackson
16 996649 16234996 Black Eyed Peas
16 1019886 16618386 Eric Clapton
15 980141 15317182 The Police
15 981451 15289554 Dido
14 973520 13781896 Elton John
13 949742 12624027 The Verve

I think it would be really interesting to incorporate the passion index into a recommender, so instead of just recommending artists that are similar to artists that a listener already likes, filter the similar artists with  a passion filter and offer up the artists that listeners are most passionate about. I think these recommendations would be more valuable to the listener.

, , ,

26 Comments

Music Hackday is coming


hackday.1.1.1.1

Open your calendars  and reserve  July 11th and 12th  for Music Hackday for  24 plus hours of solid music hacking in the heart of London.  Music Hackday is a chance for developers to get together and share ideas and code while building a music application using the music APIs from companies like Last.fm, 7digital, Gigulate, People’s Music Store, SongKick, SoundCloud and The Echo Nest.     This looks to be a really fun event.

,

3 Comments

Remix 1.1 is released

Version 1.1 of the Echo Nest remix has been released.  Adam Lindsay, in his Remix Overview describes it thus:

Remix is a sophisticated tool to allow you to quickly, expressively, and intuitively chop up existing audio content and create new content based on the old. It allows you to reach inside the music, and let the music’s own musical qualities be your — or your computer’s — guide in finding something new in the old. By using Remix’s knowledge of a given song’s structure, you can render the familiar strange, or the strange slightly more familiar-sounding. You can create countless parameterized variations of a given song — or one of near-limitless length — that respect or desecrate the original, or land on any of countless steps in between.

This release as concentrated on making it easier to install. We now have install instructions for Linux, Mac and Windows.   We also now use the FFMpeg encoder/decoder instead of mad and lame.  This has a number of advantages; it makes it easier to install, it supports a larger number of file formats, and perhaps most importantly, it is the same decoder that the Echo nest Analyze uses. This ensures that audio segment boundaries fall exactly on zero-crossings.

Remix is really fun to play with, and the results are always interesting and sometimes even musical.  Here’s an example of a song released in the last year (can you guess it?) that has been remixed to include only the first beat of each measure.

, ,

3 Comments

Building a music map

I like maps, especially maps that show music spaces – in fact I like them so much I have one framed, hanging in my kitchen.  I’d like to create a map for all of music.  Like any good map, this map should work at multiple levels; it should help you understand the global structure of the music space, while allowing you to dive in and see fine detailed structure as well.  Just as Google maps can show you that Africa is south of Europe and moments later that Stark st. intersects with Reservoir St in Nashua NH a good music map should be able to show you at a glance  how Jazz, Blues and Rock relate to each other while moments later let you find an unknown 80s hair metal band that sounds similar to Bon Jovi.

My goal is to build a map of the artist space, one the allows you to explore the music space at a global level, to understand how different music styles relate, but then also will allow you to zoom in and explore the finer structure of the artist space.

I’m going to base the music map on the artist similarity data collected from  the Echo Nest artist similarity web service.  This web service lets you get 15 most similar artists for any artist.  Using this web service I collected the artist similarity info for about 70K artists along with each artists familiarity and hotness.

Some Explorations
It would be silly to start trying to visualize 70K artists right away – the 250K artist-to-artist links would overwhelm just about any graph layout algorithm.  The graph would look like this.  So I started small, with just the near neighbors of The Beatles.  (Beatles-K1)   For my first experiment, I graphed the the nearest neighbors to The Beatles.  This plot shows how the the 15 near neighbors to the Beatles all connect to each other.

beatles-near-neighbors

In the graph, artist names are sized proportional to the familiarity of the artist.   The Beatles are bigger than The Rutles because they are more familiar.  I think the graph is pretty interesting, showing how all of the similar artists of the Beatles relate to each other, however, the graph is also really scary because it shows 64 interconnections for these 16 artists.    This graph is just showing the outgoing links for the Beatles, if we include the incoming links to the Beatles (the artist similarity function is asymettric so outgoing similarities and incoming similarities are not the same), it becomes a real mess:

beatles.1.in

If you extend this graph one more level – to include the friends of the friends of The Beatles (Beatles-K2), the graph becomes unusable.  Here’s a detail, click to see  the whole mess.  It is only 116 artists with 665 edges, but already you can see that it is not going to be usable.

beatles.2.out.detail

Eliminating the edges

Clearly the approach of drawing all of the artist connections is not going to scale to beyond a few dozen artists.  One approach is to just throw away all of the edges.  Instead of showing a graph representation, use an embedding algorithm like MDS or t-SNE to position the artists in the space.  These algorithms layout items by attempting to minimize the energy in the layout.  It’s as if all of the similar items are connected by invisible springs which will push all of the artists into positions that minimize the overall tension in the springs. The result should show that similar artists are near each other, and dissimilar artists are far away.  Here’s a detail for an  example for the friends of the friends of the Beatles plot.  (Click on it to see the full plot)

beatles2-embedded-detail.1

I find this type of visualization to be quite unsatisfying.  Without any edges in the graph I find it hard to see any structure.  I think I would find this graph hard to use for exploration.  (Although it is fun though to see the clustering of bands like The Animals, The Turtles, The Byrds, The Kinks and the Monkeee).

Drawing some of the edges

We can’t draw all of the edges, the graph just gets too dense, but if we don’t draw any edges, the map loses too much structure making it less useful for exploration.  So lets see if we can only draw some of the edges – this should bring back some of the structure, without overwhelming us with connections.  The tricky question is “Which edges should I draw?”.   The obvious choice would be to attach each artist to the artist that it is most similar to.  When apply this to the  Beatles-K2 neighborhood we get something like this:

beatles-2-nearest-neighbor

This clearly helps quite a bit. We no longer have the bowl of spaghetti, while we can still see some structure.  We can even see some clustering that make sense (Led Zeppelin is clustered with Jimi Hendrix and the Rolling Stones while Air Supply is closer to the Bee Gees).  But there are some problems with this graph.  First, it is not completely connected, there are a 14 separate clusters varying from a size of 1 to a size of 57.  This disconnection is not really acceptable. Second, there are a number of  non-intuitive flows from familiar to less familiar artists.  It just seems wrong that bands like the Moody Blues, Supertramp and ELO are connected to the rest of the music world via  Electric Light Orchestra II (shudder).

To deal with the ELO II problem I tried a different strategy.  Instead of attaching an artist to its most similar artist,  I attach it to the most similar artist that also has the closest, but greater familiarity.  This should prevent us from attaching the Moody Blues to the world via ELO II, since ELO II is of much less familiarity than the Moody Blues.   Here’s the plot:

beatles-2-nearest-fam

Now we are getting some where.  I like this graph quite a bit.  It has a nice left to right flow from popular to less popular, we are not overwhelmed with edges, and ELO II is in its proper subservient place.  The one problem with the graph is that it is still disjoint.   We have 5 clusters of artists.   There’s no way to get to  ABBA from the Beatles even though we know that ABBA is a near neighbor to the Beatles.  This is a direct product of how we chose the edges. Since we are only using some of the edges in the graph, there’s a chance that some subgraphs will be disjoint.  When I look at the a larger neighborhood (Beatles-K3), the graph becomes even more disjoint with a hundred separate clusters.     We want to be able to build a graph that is not disjoint at all, so we need a new way to select edges.

Minimum Spanning Tree
One approach to making sure that the entire graph is connected is to generate the minimum spanning tree for the graph.  The minimum spanning tree of a graph minimizes the number of edges needed to connect the entire graph.   If we start with a completely connected graph, the minimum spanning tree is guarantee to result in a completely connected graph.  This will eliminate our disjoint clusters.    For this next graph, built the minimum spanning tree of the Beatles-K2 graph.

beatles.2.out.minspan

As predicted, we no longer have separate clusters within the graph. We can find a path between any two artists in the graph.  This is a big win, we should be able to scale this approach up to an even larger number of artists without ever having to worry about disjoint clusters.  The whole world of music is connected in a single graph.  However, there’s something a bit unsatisfying about this graph.  The Beatles are connected to only two other artists: John Lennon & The Plastic Ono Band and The Swinging Blue Jeans.   I’ve never heard of the Swinging Blue Jeans.  I’m sure they sound a lot like the Beatles, but I’m also sure that most Beatles fans would not tie the two bands together so closely.    Our graph topology needs to be sensitive to this.  One approach is to weight the edges of the graph differently.  Instead of weighting them by similarity,  the edges can be weighted by the difference in familiarity between two artists.   The Beatles and Rolling Stones have nearly identical familiarities so the weight between them would be close to zero, while The Beatles and the Swinging Blue Jeans have very different familiarities, so the weight on the edge between them would be very high.  Since the minimum spanning is trying to reduce the overall weight of the edges in the graph, it will chose low weight edges before it chooses high weight edges.  The result is that we will still end up with a single graph, with none of the disjoint clusters, but artists will be connected to other artists of similar familiarity when possible.  Let’s try it out:

beatles.2.minspan.fam

Now we see that popular bands are more likely to be connected to other popular bands, and the Beatles are no longer directly connected to “The Swinging Blue Jeans”.    I’m pretty happy with this method of building the graph.  We are not overwhelmed by edges, we don’t get a whole forest of disjoint clusters, and the connections between artists makes sense.

Of course we can build the graph by starting from different artists. This gives us a deep view into that particular type of music.  For instance, here’s a graph that starts from Miles Davis:

miles-graph

Here’s a near neighbor graph starting from Metallica:

metallica-graph-small

And here’s one showing the near neighbors to Johann Sebastian Bach:

bach-graph

This graphing technique works pretty well, so lets try an larger set of artists.  Here I’m plotting the top 2,000 most popular artists.  Now, unlike the Beatles neighborhood, this set of artists is not guaranteed to be connected, so we may have some disjoint cluster in the plot.  That is expected and reasonable.  The image of the resulting plot is rather large (about 16MB) so here’s a  small detail, click on the image to see the whole thing.  I’ve also created a PDF version which may be easier to browse through.

general.2k.detail

I pretty pleased with how these graphs have turned out.  We’ve taken a very complex space and created a visualization that shows some of the higher level structure of the space (jazz artists are far away from the thrash artists) as well as some of the finer details – the female bubblegum pop artists are all near each other.  The technique should scale up to even larger sets of artists.  Memory and compute time become the limiting factors, not graph complexity.  Still, the graphs aren’t perfect – seemingly inconsequential artists sometimes appear as gateways into whole sub genre.  A bit more work is needed to figure out a better ordering for nodes in the graph.

Some things I’d like to try,  when I have a bit of spare time:

  • Create graphs with 20K artists (needs lots of memory and CPU)
  • Try to use top terms or tags of nearby artists to give labels to clusters of artists – so we can find the Baroque composers or the hair metal bands
  • Color the nodes in a meaningful way
  • Create dynamic versions of the graph to use them for music exploration. For instance, when you click on an artist you should be able to hear the artist and read what people are saying about them.

To create these graphs I used some pretty nifty tools:

  • The Echo Nest developer web services – I used these to get the artist similarity, familiarity and hotness data.  The artist similarity data that you get from the Echo Nest is really nice.  Since it doesn’t rely directly on collaborative filtering approaches it avoids the problems I’ve seen with data from other sources of artist similarity. In particular,  the Echo Nest similarity data is not plagued by hubs (for some music services,  a popular band like Coldplay may have hundreds or thousands of near neighbors due to a popularity bias inherent in CF style recommendation).  Note that I work at the Echo Nest. But don’t be fooled into thinking I like the Echo Nest artist similarity data because I work there. It really is the other way around.  I decided to go and work at the Echo Nest because I like their data so much.
  • Graphviz – a tool for rendering graphs
  • Jung – a Java library for manipulating graphs

If you have any ideas about graphing artists – or if you’d like to see a neighborhood of a particular artist. Please let me know.

, , , ,

18 Comments