Archive for category data

Draw a picture of your musical taste

Posted by Paul in data, fun, Music, research, visualization on September 26, 2009

Tristan F's Musical taste

Tristan F from the BBC posted this hand drawing of his musical taste to Flickr. As he says in the photo comments: I had to stop at some point so it’s not comprehensive. But it’s all about connections.

I find these types of drawings to yield really interesting insights into the listener and to music in general. For instance Tristan has a line connecting Sufjan Stevens to Bill Frisell. I’m still pondering that connection. As I prepare for my upcoming ISMIR tutorial on Using Visualizations for Discovering Music, I’d like to collect a few more personal visualizations of music taste. If you feel so inclined , draw a picture that represents your music taste, post it to Flickr and tag it with ‘MyMusicTaste’. I’ll post a follow up … and particularly interesting ones will appear in the tutorial.

3 Comments

Herd it on Facebook

Posted by Paul in data, Music, music information retrieval, research on September 25, 2009

UCSD Researcher Gert Lanckriet announced today that “Herd It” – the game-with-a-purpose for collecting audio annotations has been officially launched on Facebook. Following in the footsteps of other gwaps such as Major Miner, Tag-a-tune and the Listen Game.

On the music-ir mailing list Gert explains ‘Herd it’: “The scientific goal of this experiment is to investigate whether a human computation game integrated with the Facebook social network will allow the collection of high quality tags for audio clips. The quality of the tags will be tested by using them to train an automatic music tagging system (based on statistical models). Its predictive accuracy will be compared to a system trained on high quality tags collected through controlled human surveys (such as, e.g., the CAL500 data set). The central question we want to answer is whether the “game tags” can train an auto-tagging system as (or more) accurately than “survey tags” and, if yes, under what conditions (amount of tags needed, etc.). The results will be reported once enough data has been collected.”

I’ve played a few rounds of the game and enjoyed myself. I recognized all of the music that they played (it seemed to be drawn from top 100 artists like Nirvana, Led Zeppelin, Maria Carey and John Lennon). The timed rounds made the game move quickly. Overall, the game was fun. But I did miss the feeling of close collaboration that I would get from some other Gwaps where I would have to try to guess how my partner would try to describe a song. Despite this, I found the games to be fun and I could easily see spending a few hours trying to get a top score. The team at UCSD clearly has put lots of time into making the games highly interactive and fun. Animations, sound and transparent game play all add to the gaming experience. Once glitch, even though I was logged into Facebook, the Herd It game didn’t seem to know who I was, it just called me ‘Herd It’. So my awesome highscore is anonymous.

Here are some screen shots from the game. For this round, I had to chose the most prominent sound (this was for the song ‘Heart of Gold’), I chose slide guitar, but most people chose acoustic guitar (what do they know!).

For this round, I had to chose the genre for a song. easy enough.

For this round I had to position a song on a Thayer mood model scale.

Here’s the game kick off screen … as you can see, I’m “Herd it” and not Paul

I hope the Herd It game attracts lots of attention. It could be a great source of music metadata.

cal, facebook, games, gwap, ucsd

2 Comments

Berlin Music Hackday presentation videos

Posted by Paul in code, data, events, fun, web services on September 20, 2009

There are a bunch of videos of presentations and demos from the Music Hackday berlin: http://qik.com/digitalwaveriding:

berlin, hackday, Music

Hacking on the Echo Nest at the Berlin Music Hackday

Posted by Paul in code, data, events, remix, The Echo Nest, web services on September 16, 2009

The Berlin Music Hackday is nearly upon us. Ben Lacker (a.k.a. DJ API) will be representing the Echo Nest at this wonderful event. If you want to maximize your hacking time during the hackday there are a few things that you can do in advance to get ready to hack on the Echo Nest APIs:

Get an Echo Nest API Key – If you are going to be using the API, you need to get a key. You can get one for free from: developer.echonest.com
Read the API overview – The overview gives you a good idea of the capabilities of the API. If you are thinking of writing a remix application, be sure to read Adam Lindsay’s wonderful remix tutorial.
Pick a client library – There are a number of client libraries for The Echo Nest – select one for your language of choice and install it.
Think of a great application – easier said than done. If you are looking for some inspiration, checkout these examples: morecowbell, donkdj, Music Explorer FX, and Where’s the Pow? . You’ll find more examples in the Echo Nest gallery of Showcase Apps. If you are stuck for an idea ask me (paul@echonest.com) or Ben – we have a list of application ideas that we think would be fun to write.

At the end of the hackday, Ben will choose the Most Awesome Echo Nest Hackday Application. The developer of this application will go home a shiny new iPod touch. If you want your application to catch Ben’s eye write an Echo Nest application that makes someone say “woah! how did you do that!”, extra points if its an application with high viral potential. Check out the list of hacks created at the London Music Hackday to get inspiration.

berlin, hackday

Music Hack Day Berlin

Posted by Paul in code, data, Music, The Echo Nest on September 1, 2009

On the heals of the very successful London Music Hackday, comes the Berlin Music Hackday which will be held on September 18/19/20 at the very cool Radialsystem V in Berlin Germany.

Site of the Berlin Music Hacday

The hackday is totally free for participants but is limited to 150 participants. (and if this is organized like the London hackday, if you want to attend, be prepared to describe how you hack hardware, software or music – not just anyone can fill one of the 150 slots).

The London hackday was such a great event, I’m glad to see that it is being repeated in different parts of the world. Look for more Music Hackdays coming to a city near you.

Hacking music at the London Music Hackday

hack, hackday, Music, The Echo Nest

1 Comment

The Stairway Detector

Posted by Paul in data, fun, Music, playlist, The Echo Nest, web services on August 17, 2009

Last night I was watching the pilot for Glee (a snarky TV version of High school musical) with my 3 teenage daughters. I was surprised to hear the soundtrack filled with songs by the band Journey, songs that brought me back to my own high school years. The thing that I like the most about Journey is that many of their songs have this slow and gradual build up over the course of the whole song as in this song Lovin Touchin Squeezin:

A number of my favorite songs have this slow build up. The canonical example is Zep’s ‘Stairway to Heaven’ – it starts with a slow acoustic guitar and over the course of 8 minutes builds to metal frenzy. I thought it would be fun to see if I could write a bit of software that could find the songs that have the same arc as ‘Stairway to Heaven’ or ‘Lovin, Touchin Squeezin’ – songs that have this slow build. With this ‘stairway detector’ I could build playlists filled with the songs that fire me up.

The obvious place to start with is to look how the loudness of a song changes overtime. To do this I used the Echo Nest developer API to extract the loudness as a function of time for Journey’s Lovin, Touchin Squeezin:

In this plot the light green curve is the loudness, while the blue line is a windowed average of the loudness. This plot shows a nice rise in the volume over the course of the song. Compared to a song like the Beatles ‘Ticket to Ride’ that doesn’t have this upward slope:

From these two examples, it is pretty clear that we can build our stairway-detector just by looking at the average slope of the volume. The higher the slope, the bigger the build. Now, I suspect that there’s lots of ways to find the average slope of a bumpy line – but I like to always try the simplest thing that could possibly work first – and for me the simplest thing was to just divide the average loudness of the second half of the song by the average loudness of the first half of the song. So for example, with the Journey song the average loudness of the second half of the song is -15.86 db and the average of the first half of the song is -24.37 db. This gives us a ratio of 1.54, while ‘Ticket to ride’ gets a ratio of 1.06. Here’s the Journey song with averages shown:

Here are a few more songs that fit the ‘slow build’ profile:

‘Stairway to Heaven’ has a score of 1.6 so it has a bigger build than Journey’s Lovin’.

Simon and Garfunkle’s ‘Bridge over troubled water’ has an even bigger build with a score of 1.7.

Also sprach Zarathustra has a more modest score of 1.56

With this new found metric I analyzed a few thousand of the tracks in my personal collection to find the songs with the biggest crescendos. The biggest of all was this song by Muse with a whopping score of 3.07:

Another find is Arcade Fire’s “My Body is a Cage” with a score of 2.32.

The metric isn’t perfect. For instance, I would have expected Postal Services ‘Natural Anthem’ to have a high score because it has such a great build up, but it only gets a score of 1.19. Looking at the plot we can see why:

After the initial build up, there’s a drop an energy for that last quarter of the song, so even though the song has a sustained crescendo for 3 minutes it doesn’t get a high score due to this drop.

Of course, we can use this ratio to find tracks that go the other way, to find songs that gradually wind down. These seem to occur less frequently than the songs that build up. One example is Neutral Milk Hotel’s Two Headed Boy:

Despite the fact that I’m using a very naive metric to find the loudness slope, this stairway detector is pretty effective in finding songs that have that slow build. It’s another tool that I can use for helping to build interesting playlists. This is one of the really cool things about how the Echo Nest approaches music playlisting. By having an understanding of what the music actually sounds like, we can build much more interesting playlists than you get from genius-style playlists that only take into account artists co-occurrence.

data, Music, playlists, The Echo Nest

1 Comment

The Sinister Index

Posted by Paul in data, fun, Music, research, The Echo Nest, web services on July 8, 2009

Like many, I like to eat Cheetos, when I’m relaxing and browsing the web, especially when I’m looking for new music. The problem is that Cheetos leaves this nasty residue on the fingers which gets transferred to the keyboard rather quickly. To avoid this problem I like to use my keyboard one handed (I know what you are thinking, but really, its the Cheetos). Which is why one of my favorite bands is Weezer. I can type ‘weezer’ with my left hand leaving my right hand free for Cheetos, and leaving my keyboard clean. Still, I was in the mood for music by some other bands so I thought it would be interesting to find all of the bands that can be typed using just my left hand. I wrote a Python script, ran it on a list of about 800 thousand artist names and came up with a rather large list of sinister band names. Here are the longest:

der weg des wassers
everette red bear
sweet ever after
state far better
cassettezzzzzzzz
barbara decesare
westgate street
streetbeat crew
street bastards
reve de cabaret
rebecca everett
cabezas de cera
barbara taggart
warsaw was raw

Restricting the search to just the more popular artists I find this list of popular sinister artists:

wet wet wet
savatage
bee gees
seabear
garbage
cascada
carcass
caesars
weezer
feeder
vader
texas
sweet
stars
seeed
eve 6
dredg
creed
vast
sade
free
bebe
abba
xtc
war
rza
eve
era
d12
atb
afx
abc
311
112
bt

To be evenhanded, I offer this list of dexterous artists:

phillip moll
phillip hill
yumiko ohno
yoon il-loh
uh uh loony
polmo polpo
pinko pinko
opi yum yum
oli oli oli
oh no oh my
nylon union
nylon pylon
monki monki
homo homini
yuko kouno
yuho yokoi

And a list of popular dexterous artists:

yoko ono
moloko
pulp
pink
mylo
mono
koop
mum
iio
him
l7

There seem to be many more sinister artists than dexterous artists. I suspect that this is because many artists now recognize the Cheetos issue and are selecting sinister names. Since identifying sinister artists is becoming such a big issue in music search, we will likely be offering a sinister index as part of The Echo Nest web services. The sinister index is a number between zero and one that indicates how easy it is to type the artist name with your left hand. Weezer has a sinister index of 1 while Yoko Ono has a sinister index of zero. Look for it soon.

cheetos, Music, sinister index, The Echo Nest

2 Comments

The Coolness Index

Posted by Paul in data, fun, Music, The Echo Nest on July 1, 2009

Some artists just are not cool – your mom likes ABBA, so there’s no way you are going to listen to them, even if you think Mamma Mia is rather catchy. Likewise you may think High School Musical’s ‘Bop to the top’ is mucho gusto, but you don’t want anyone to know it. Coolness is hard to quantify, ephemeral and transient (and of course, very subjective); some artists like Miles Davis and the Velvet Underground will always be totally cool – while some fade in and out of coolness (Elvis, Stevie Wonder, Neil Diamond, Sting), and some artists – well, it is hard to tell if they were ever cool (Miley Cyrus, Creed, and Nickeback come to mind).

Imagine if there was an objective measure for coolness – a number that could be attached to each artist that indicated how ‘cool’ the artist was. We’d be able to do all sorts of interesting things with such a ‘coolness index’. We could make a ‘music makeover’ playlist that would take you from Miley to Miles in 12 songs (consider it a 12-step taste recovery program) or we could create a music rehab playlist that takes you from Amy Winehouse to Kate Nash. But of course, the concept of cool is too hard to nail down. Is Johnny Cash cool? Michael Jackson? Prince? Context, demographics, locale all play a role.

It may be too hard to tell whether an artist is cool, but we have all sorts of ways to tell that an artist is definitely not cool. For instance, if lots of listeners really don’t want people to know that they are listening to a particular artist, then that artist is probably not too cool. Luckily, there’s an interesting source for just this kind of data. Recently, the researchers at Last.fm published a list of the ‘most unwanted scrobbles‘. This is a list of tracks that were most frequently deleted by the Last.fm community from their scrobbles in the last month. These are the tracks that Last.fm listeners didn’t want people to know that the listened to. Here’s the first page of the most unwanted scrobbles:

Kudos to Last.fm for publishing this data. It’s a great source for the uncool. Collecting all the artists from the pages we can build a list of artists that have frequently had their scrobbles deleted:

Lady GaGa
Britney Spears
Katy Perry
Rihanna
Paramore
Coldplay
Taylor Swift
Beyoncé
Avril Lavigne
Marc Seales, composer. New Stories. Ernie Watts, saxophone
Alexander Rybak
Black Eyed Peas
Kings of Leon
Muse
My Chemical Romance
Linkin Park
Korn
Miley Cyrus
Jason Mraz
Metro Station
Leona Lewis
Green Day
Evanescence
Amy Whinehouse
Oasis
Nelly Furtado

This list rings true as set of ‘uncool’ artists (with the exception Marc Seales, who happens to have a piece of music, called ‘Highway Blues’, that can be found in most ‘Sample Music’ folders on most Windows XP computers, and is likely frequently scrobbled because of this). Ideally this list should be normalized for popularity – naturally artists that have more listeners will be scrobbled more and consequently be deleted more too. but there’s not enough data in this list to normalize properly so we’ll make do with an unnormalize list. I find it interesting how many female acts are on the list. Is it not cool to listen to female artists?

Another approach to find the uncool is to look for artists that have been tagged as ‘guilty pleasure’ on sites like Last.fm. For these artists, by applying the ‘guilty pleasure’ tag people are identifying artists that they are embarrassed to be listening to. Here’s a list of the top 100 popular artists that have been frequently tagged with ‘guilty pleasure’ – for this list I’m normalizing the data so popularity doesn’t factor into the list order:

Katy Perry
Ashlee Simpson
Spice Girls
Lindsay Lohan
Mandy Moore
Jessica Simpson
Backstreet Boys
Hilary Duff
Metro Station
Britney Spears
Justin Timberlake
Taylor Swift
Rihanna
The Pussycat Dolls
Kelly Clarkson
Christina Aguilera
Fall Out Boy
Take That
Avril Lavigne
Ricky Martin
Girls Aloud
Fergie
Neil Diamond
McFly
Robyn
The Veronicas
Ace of Base
ABBA
Cline Dion
Chris Brown
All Time Low
Kanye West
Gwen Stefani
Good Charlotte
P!nk
Usher
blink-182
R. Kelly
Nelly Furtado
The Get Up Kids
Madonna
Timbaland
Beyonce
New Found Glory
Natasha Bedingfield
Akon
Jem
Ciara
Robbie Williams
Paramore
The Wallflowers
Michelle Branch
Taking Back Sunday
Creed
Savage Garden
The All-American Rejects
Simple Plan
Shania Twain
Sugababes
Tegan and Sara
Everclear
Sugarcult
The Starting Line
Brand New
Destiny’s Child
Cyndi Lauper
Mariah Carey
Westlife
Maroon 5
Melanie C
Jennifer Lopez
Michael Jackson
Kelis
Tears for Fears
Alkaline Trio
Dashboard Confessional
Vanessa Carlton
Lily Allen
Bowling for Soup
Jet
50 Cent
Trivium
Cher
Eve 6
Sean Paul
Kylie Minogue
Howie Day
Sophie Ellis-Bextor
My Chemical Romance
Third Eye Blind
Saves the Day
Bryan Adams
Blondie
Boston
John Mellencamp
Simply Red
Whitney Houston
The Corrs
The Calling
Motion City Soundtrack

There’s overlap between the two lists: Avril, Britney, Katy, Nelly, Taylor, Rihanna, along with the Disney crowd. Again, there seems to be an anti-female coolness bias on the list. It is hard to be cool and female.

The ‘most unwanted scrobbles’ and the ‘guilty+pleasure’ approach to the coolness index only get us so far. They can help us identify music that people are embarrassed to admit that they enjoy. But they only give us one end of the coolness spectrum. We can find what is not cool, but we can’t find out what is cool. We have in effect an ‘Uncoolness Index’. Still, knowing which artists are uncool can be helpful for all sorts of things. If we are building a playlist for that party, we can turn on the uncool filter to make sure that Ricky Martin or Robbie Williams won’t sneak into the mix. Likewise, if we are building a recommender, we can use the Uncoolness index to decide how cool the user is and recommend music that’s slightly less uncool than what they are used to listening to.

Next steps are to figure out how to learn not just what is uncool, but also what is cool, so we can build the true ‘coolness index’ and be able to tell how cool any artist is. I think that is going to be a harder problem, but I have some ideas …

coolness, coolness index, last.fm, Music, The Echo Nest

42 Comments

The Passion Index

Posted by Paul in data, fun, Music, recommendation, research, The Echo Nest on June 18, 2009

One of the ways that Music 2.0 has changed how we think about music is that there is so much interesting data available about how people are listening to music. Sites like Last.fm automatically track all sorts of interesting data that just was not available before. Forty years ago, a music label like Capitol would know how many copies the album Abbey Road sold in the U.S., but the label wouldn’t know how many times people actually listened to the album. Today, however, our iPods and desktop music players keep careful track of how many times we play each song, album and artist – giving us a whole new way to look at artist popularity. It’s not just sales figures anymore, its how often are people actually listening to an artist. If you go to Last.fm you can see that The Beatles have over 1.75 million listeners and 168 million plays. It makes it easy for us to see how popular the Beatles are compared to another band (the monkees, for instance have 2.5m plays and 285K listeners).

With all of this new data available, there are some new ways we can look at artists. Instead of just looking at artists in terms of popularity and sales rank, I think it is interesting to see which artists generate the most passionate listeners. These are artists that dominate the playlists of their fans. I think this ‘passion index’ may be an interesting metric to use to help people explore for and discovery music. Artists that attract passionate fans may be longer lived and worth a listeners investment in time and money.

How can we calculate a passion index? There are probably a number of indicators: the number of edits to the bands wikipedia page, the average distance a fan travels to attend a show by the artist, the number of fan sites for an artist. All of these may be a bit difficult to collect, especially for a large set of artists. One simple passion metric is just the average number of artist plays per listener. Presumably if an artist’s listeners are playing an artist’s songs more than average they are more passionate about the artist. One thing that I like about this approach to the passion index is that it is extremely easy to calculate – just divide the total artist plays by the total number of artist listeners and you have the passion index. Yes, there are many confounding factors – for instance, artists with longer songs are penalized – still I think it is a pretty good measure.

I calculated the passion index for a large collection of artists. I started with about a million artists (it is really nice to have all this data at the Echo Nest;), and filtered these down to the 50K most popular artists. I plotted the number of artist plays vs. the number of artist listeners for each of the 50 K listeners. The plot shows that most artists fall into the central band (normal passion), but some (the green points) are high passion artists and some (the blue points) are low passion artists.

For the 50K artists, the average track plays per artist/listener is just 11 plays (with a std deviation of about 11.5). Considering that there are a substantial number of artists in my iTunes collection that I’ve played only once, this seems pretty resaonable.

So who are the artists with the highest passion index? Here are the top ten:

Passion	Listeners	Plays	Artist
332	4065	1352719	上海アリス幻樂団
292	10374	3032373	Belo
245	3147	773959	Petos
241	2829	683191	Reilukerho
208	4887	1020538	Sound Horizon
190	24422	4652968	동방신기
185	9133	1691866	岡崎律子
175	9171	1611106	Kollegah
173	17279	3004410	Super Junior
170	62592	10662940	Böhse Onkelz

I didn’t recognize any of these artists (and I’m not even sure if 上海アリス幻樂団 is really an artist – according to the Japanese wikipedia it is a fan club in Japan to produce a music game coterie – whatever that means). Belo is a Brazilian pop artist that does indeed seem to have some rather passionate fans.

It is not surprising that it is hard for popular artists to rank at the very top of the passion index. Popular artists are exposed to many, many listeners which can easily reduce the passion index. Here are the top passion-ranked artists drawn from the top-1000 most popular artists:

Passion	Listeners	Plays	Artist
115	527653	60978053	In Flames
95	1748159	167765187	The Beatles
79	2140659	170106143	Radiohead
78	282308	22071498	Die Ärzte
75	269052	20293399	Mindless Self Indulgence
75	691100	52217023	Nightwish
74	332658	24645786	Porcupine Tree
74	1056834	79135038	Nine Inch Nails
72	384574	27901385	Opeth
70	601587	42563097	Rise Against
69	357317	24911669	Sonata Arctica
69	1364096	95399150	Metallica
66	460518	30625121	Children of Bodom
66	619396	41440369	Paramore
65	504464	33271871	Dream Theater
65	1391809	90888046	Pink Floyd
64	540184	34635084	Brand New
62	862468	54094977	Iron Maiden
62	1681914	105935202	Muse
61	381942	23478290	Beirut

I find it interesting to see all of the heavy metal bands in the top 20. Metal fans are indeed true fans.

Going to the other end of passion, we find the 20 popular artists that have the least passionate fans:

Passion	Listeners	Plays	Artist
6	270692	1767977	Julie London
6	284087	1964292	Smoke City
6	294100	1784358	Dinah Washington
6	295200	1799303	The Bangles
6	295990	1832771	Donna Summer
6	306018	1905285	Bonnie Tyler
6	307407	2123599	Buffalo Springfield
6	311543	2085085	Franz Schubert
6	312078	1909769	The Hollies
6	313732	2190008	Tom Jones
6	325454	2025366	Eric Prydz
6	331837	2259892	Sarah Vaughan
6	332072	2016898	Soft Cell
6	407622	2622570	Steppenwolf
5	275770	1605268	Diana Ross
5	281037	1615125	Isaac Hayes
5	282095	1685959	The Isley Brothers
5	283467	1666824	Survivor
5	311867	1694947	Peggy Lee
5	333437	1925611	Wham!
5	388183	2244878	Kool & The Gang

I guess people are not too passionate about Soft Cell.

Here’s a passion chart for the top 100 most popular artists. Even the artists at the bottom of this chart are way above average on the passion index.

Passion	Listeners	Plays	Artist
95	1748159	167765187	The Beatles
79	2140659	170106143	Radiohead
74	1056834	79135038	Nine Inch Nails
69	1364096	95399150	Metallica
65	1391809	90888046	Pink Floyd
62	1681914	105935202	Muse
61	1397442	85685015	System of a Down
61	1403951	86849524	Linkin Park
60	1346298	81762621	Death Cab for Cutie
57	1060269	61127025	Fall Out Boy
56	1155877	65324424	Arctic Monkeys
55	1897332	104932225	Red Hot Chili Peppers
54	950416	52019102	My Chemical Romance
50	1131952	56622835	blink-182
49	2313815	115653456	Coldplay
48	964970	47102550	Sigur Rós
48	1108397	53260614	Modest Mouse
48	1350931	65865988	Placebo
47	1129004	53771343	Jack Johnson
44	1297020	57111763	Led Zeppelin
43	1011131	43930085	Kings of Leon
42	947904	39970477	Marilyn Manson
42	1065375	45459226	Britney Spears
42	1246213	52656343	Incubus
42	1256717	53610102	Bob Dylan
41	1527721	62654675	Green Day
41	1881718	78473290	The Killers
40	1023666	41288978	Queens of the Stone Age
40	1057539	42472755	Kanye West
40	1108044	44845176	Interpol
40	1247838	49914554	Depeche Mode
40	1318140	53594021	Bloc Party
39	1266502	49492511	The White Stripes
38	1048025	40174997	Evanescence
38	1091324	42195854	Pearl Jam
38	1734180	67541885	Nirvana
37	978342	36561552	The Kooks
37	1097968	41046538	The Shins
37	1114190	42051787	The Offspring
37	1379096	51313607	The Cure
37	1566660	58923515	Foo Fighters
36	1326946	48738588	The Smashing Pumpkins
35	1091278	39194471	Björk
35	1271334	45619688	The Strokes
34	955876	33376744	Jimmy Eat World
34	1251461	42949597	Daft Punk
33	989230	33257150	Pixies
33	1012060	34225186	Eminem
33	1051836	35529878	Avril Lavigne
33	1110087	36785736	Johnny Cash
33	1121138	37645208	AC/DC
33	1161536	38615571	Air
32	961327	31286528	The Prodigy
32	1038491	33270172	Amy Winehouse
32	1410438	45614720	David Bowie
32	1641475	52612972	Oasis
32	1693023	54971351	U2
31	1258854	39598249	Madonna
31	1622198	51669720	Queen
30	1032223	31750683	Portishead
30	1178755	35600916	Rage Against the Machine
30	1249417	38284572	The Doors
30	1393406	42717325	Beck
29	1030982	30044419	Yeah Yeah Yeahs
29	1187160	34712193	Massive Attack
29	1348662	39131095	Weezer
29	1361510	39753640	Snow Patrol
28	985715	28485679	The Postal Service
28	1045205	30105531	The Clash
28	1305984	37807059	Guns N’ Roses
28	1532003	43998517	Franz Ferdinand
27	1000950	27262441	Nickelback
27	1395278	37856776	Gorillaz
26	1503035	40161219	The Rolling Stones
25	1345571	33741254	R.E.M.
24	1311410	32588864	Moby
23	973319	22962953	Audioslave
23	976745	22557111	3 Doors Down
23	1123549	26696878	Keane
22	998933	21995497	Justin Timberlake
22	1025990	23145062	Rihanna
22	1109529	24687603	Maroon 5
22	1120968	24796436	Jimi Hendrix
22	1160410	26641513	[unknown]
21	1151225	25081110	The Who
20	1057288	22084785	The Chemical Brothers
20	1105159	22925198	Kaiser Chiefs
20	1117306	22390847	Nelly Furtado
20	1201937	25019675	Aerosmith
20	1253613	25582503	Blur
19	968885	19219364	Simon & Garfunkel
19	974687	18528890	Christina Aguilera
19	1025305	20157209	The Cranberries
19	1144816	22252304	Michael Jackson
16	996649	16234996	Black Eyed Peas
16	1019886	16618386	Eric Clapton
15	980141	15317182	The Police
15	981451	15289554	Dido
14	973520	13781896	Elton John
13	949742	12624027	The Verve

I think it would be really interesting to incorporate the passion index into a recommender, so instead of just recommending artists that are similar to artists that a listener already likes, filter the similar artists with a passion filter and offer up the artists that listeners are most passionate about. I think these recommendations would be more valuable to the listener.

Music, passion, recommendation, The Echo Nest

26 Comments

Building a music map

Posted by Paul in data, fun, java, Music, research, The Echo Nest, visualization, web services on May 31, 2009

I like maps, especially maps that show music spaces – in fact I like them so much I have one framed, hanging in my kitchen. I’d like to create a map for all of music. Like any good map, this map should work at multiple levels; it should help you understand the global structure of the music space, while allowing you to dive in and see fine detailed structure as well. Just as Google maps can show you that Africa is south of Europe and moments later that Stark st. intersects with Reservoir St in Nashua NH a good music map should be able to show you at a glance how Jazz, Blues and Rock relate to each other while moments later let you find an unknown 80s hair metal band that sounds similar to Bon Jovi.

My goal is to build a map of the artist space, one the allows you to explore the music space at a global level, to understand how different music styles relate, but then also will allow you to zoom in and explore the finer structure of the artist space.

I’m going to base the music map on the artist similarity data collected from the Echo Nest artist similarity web service. This web service lets you get 15 most similar artists for any artist. Using this web service I collected the artist similarity info for about 70K artists along with each artists familiarity and hotness.

Some Explorations
It would be silly to start trying to visualize 70K artists right away – the 250K artist-to-artist links would overwhelm just about any graph layout algorithm. The graph would look like this. So I started small, with just the near neighbors of The Beatles. (Beatles-K1) For my first experiment, I graphed the the nearest neighbors to The Beatles. This plot shows how the the 15 near neighbors to the Beatles all connect to each other.

In the graph, artist names are sized proportional to the familiarity of the artist. The Beatles are bigger than The Rutles because they are more familiar. I think the graph is pretty interesting, showing how all of the similar artists of the Beatles relate to each other, however, the graph is also really scary because it shows 64 interconnections for these 16 artists. This graph is just showing the outgoing links for the Beatles, if we include the incoming links to the Beatles (the artist similarity function is asymettric so outgoing similarities and incoming similarities are not the same), it becomes a real mess:

If you extend this graph one more level – to include the friends of the friends of The Beatles (Beatles-K2), the graph becomes unusable. Here’s a detail, click to see the whole mess. It is only 116 artists with 665 edges, but already you can see that it is not going to be usable.

Eliminating the edges

Clearly the approach of drawing all of the artist connections is not going to scale to beyond a few dozen artists. One approach is to just throw away all of the edges. Instead of showing a graph representation, use an embedding algorithm like MDS or t-SNE to position the artists in the space. These algorithms layout items by attempting to minimize the energy in the layout. It’s as if all of the similar items are connected by invisible springs which will push all of the artists into positions that minimize the overall tension in the springs. The result should show that similar artists are near each other, and dissimilar artists are far away. Here’s a detail for an example for the friends of the friends of the Beatles plot. (Click on it to see the full plot)

I find this type of visualization to be quite unsatisfying. Without any edges in the graph I find it hard to see any structure. I think I would find this graph hard to use for exploration. (Although it is fun though to see the clustering of bands like The Animals, The Turtles, The Byrds, The Kinks and the Monkeee).

Drawing some of the edges

We can’t draw all of the edges, the graph just gets too dense, but if we don’t draw any edges, the map loses too much structure making it less useful for exploration. So lets see if we can only draw some of the edges – this should bring back some of the structure, without overwhelming us with connections. The tricky question is “Which edges should I draw?”. The obvious choice would be to attach each artist to the artist that it is most similar to. When apply this to the Beatles-K2 neighborhood we get something like this:

This clearly helps quite a bit. We no longer have the bowl of spaghetti, while we can still see some structure. We can even see some clustering that make sense (Led Zeppelin is clustered with Jimi Hendrix and the Rolling Stones while Air Supply is closer to the Bee Gees). But there are some problems with this graph. First, it is not completely connected, there are a 14 separate clusters varying from a size of 1 to a size of 57. This disconnection is not really acceptable. Second, there are a number of non-intuitive flows from familiar to less familiar artists. It just seems wrong that bands like the Moody Blues, Supertramp and ELO are connected to the rest of the music world via Electric Light Orchestra II (shudder).

To deal with the ELO II problem I tried a different strategy. Instead of attaching an artist to its most similar artist, I attach it to the most similar artist that also has the closest, but greater familiarity. This should prevent us from attaching the Moody Blues to the world via ELO II, since ELO II is of much less familiarity than the Moody Blues. Here’s the plot:

Now we are getting some where. I like this graph quite a bit. It has a nice left to right flow from popular to less popular, we are not overwhelmed with edges, and ELO II is in its proper subservient place. The one problem with the graph is that it is still disjoint. We have 5 clusters of artists. There’s no way to get to ABBA from the Beatles even though we know that ABBA is a near neighbor to the Beatles. This is a direct product of how we chose the edges. Since we are only using some of the edges in the graph, there’s a chance that some subgraphs will be disjoint. When I look at the a larger neighborhood (Beatles-K3), the graph becomes even more disjoint with a hundred separate clusters. We want to be able to build a graph that is not disjoint at all, so we need a new way to select edges.

Minimum Spanning Tree
One approach to making sure that the entire graph is connected is to generate the minimum spanning tree for the graph. The minimum spanning tree of a graph minimizes the number of edges needed to connect the entire graph. If we start with a completely connected graph, the minimum spanning tree is guarantee to result in a completely connected graph. This will eliminate our disjoint clusters. For this next graph, built the minimum spanning tree of the Beatles-K2 graph.

As predicted, we no longer have separate clusters within the graph. We can find a path between any two artists in the graph. This is a big win, we should be able to scale this approach up to an even larger number of artists without ever having to worry about disjoint clusters. The whole world of music is connected in a single graph. However, there’s something a bit unsatisfying about this graph. The Beatles are connected to only two other artists: John Lennon & The Plastic Ono Band and The Swinging Blue Jeans. I’ve never heard of the Swinging Blue Jeans. I’m sure they sound a lot like the Beatles, but I’m also sure that most Beatles fans would not tie the two bands together so closely. Our graph topology needs to be sensitive to this. One approach is to weight the edges of the graph differently. Instead of weighting them by similarity, the edges can be weighted by the difference in familiarity between two artists. The Beatles and Rolling Stones have nearly identical familiarities so the weight between them would be close to zero, while The Beatles and the Swinging Blue Jeans have very different familiarities, so the weight on the edge between them would be very high. Since the minimum spanning is trying to reduce the overall weight of the edges in the graph, it will chose low weight edges before it chooses high weight edges. The result is that we will still end up with a single graph, with none of the disjoint clusters, but artists will be connected to other artists of similar familiarity when possible. Let’s try it out:

Now we see that popular bands are more likely to be connected to other popular bands, and the Beatles are no longer directly connected to “The Swinging Blue Jeans”. I’m pretty happy with this method of building the graph. We are not overwhelmed by edges, we don’t get a whole forest of disjoint clusters, and the connections between artists makes sense.

Of course we can build the graph by starting from different artists. This gives us a deep view into that particular type of music. For instance, here’s a graph that starts from Miles Davis:

Here’s a near neighbor graph starting from Metallica:

And here’s one showing the near neighbors to Johann Sebastian Bach:

This graphing technique works pretty well, so lets try an larger set of artists. Here I’m plotting the top 2,000 most popular artists. Now, unlike the Beatles neighborhood, this set of artists is not guaranteed to be connected, so we may have some disjoint cluster in the plot. That is expected and reasonable. The image of the resulting plot is rather large (about 16MB) so here’s a small detail, click on the image to see the whole thing. I’ve also created a PDF version which may be easier to browse through.

I pretty pleased with how these graphs have turned out. We’ve taken a very complex space and created a visualization that shows some of the higher level structure of the space (jazz artists are far away from the thrash artists) as well as some of the finer details – the female bubblegum pop artists are all near each other. The technique should scale up to even larger sets of artists. Memory and compute time become the limiting factors, not graph complexity. Still, the graphs aren’t perfect – seemingly inconsequential artists sometimes appear as gateways into whole sub genre. A bit more work is needed to figure out a better ordering for nodes in the graph.

Some things I’d like to try, when I have a bit of spare time:

Create graphs with 20K artists (needs lots of memory and CPU)
Try to use top terms or tags of nearby artists to give labels to clusters of artists – so we can find the Baroque composers or the hair metal bands
Color the nodes in a meaningful way
Create dynamic versions of the graph to use them for music exploration. For instance, when you click on an artist you should be able to hear the artist and read what people are saying about them.

To create these graphs I used some pretty nifty tools:

The Echo Nest developer web services – I used these to get the artist similarity, familiarity and hotness data. The artist similarity data that you get from the Echo Nest is really nice. Since it doesn’t rely directly on collaborative filtering approaches it avoids the problems I’ve seen with data from other sources of artist similarity. In particular, the Echo Nest similarity data is not plagued by hubs (for some music services, a popular band like Coldplay may have hundreds or thousands of near neighbors due to a popularity bias inherent in CF style recommendation). Note that I work at the Echo Nest. But don’t be fooled into thinking I like the Echo Nest artist similarity data because I work there. It really is the other way around. I decided to go and work at the Echo Nest because I like their data so much.
Graphviz – a tool for rendering graphs
Jung – a Java library for manipulating graphs

If you have any ideas about graphing artists – or if you’d like to see a neighborhood of a particular artist. Please let me know.

maps, Music, the e, The Echo Nest, visualization

18 Comments

Music Machinery