Archive for category playlist
Last week I compared the playlisting capabilities of iTunes Genius, Google’s new Instant Mix and The Echo Nest’s Playlist API. I found that Google’s Instant Mix Playlist were filled with many WTF selections (Coldplay on a Miles Davis playlist) and iTunes Genius had problems generating playlists for any track by the Beatles. I rechecked some of the playlists today to see how they were doing. It looks like both services have received an upgrade since my last post. Here’s the new Google Instant Mix playlist based on a Miles Davis seed song:
All the big WTFs from last week’s test are gone – yay Google for fixing this so quickly. The only problem I see is the doubled ‘Old Folks’ song, but that’s not a WTF. However, I can’t give Google Instant Mix a clean slate yet. Google had a chance to study my particular collection (they asked, and I gave them my permission to do so), so I am sure that they paid particular attention to the big WTFs from last week. I’ll need to test again with a new collection and different seeds to see if their upgrade is a general one. Still, for the limited seeds that I tried, the WTFs seem to be gone.
Similarly, iTunes seems to have had an upgrade. Last week, it couldn’t make any playlist from a Beatles’s song, but this week they can. Here’s a playlist created with iTunes Genius with Polythene Pam as a seed:
Genius creates a serviceable playlist, with no WTFs with the Beatles as a seed, so like Google they were able to clear up their WTFs that I noted from last weeks post. No clean slate for Apple though .. I have seen some comments about how Genius appears to have problems generating playlists for new tracks. More investigation is needed to understand if this is really a problem.
Given the traffic that last week’s post received, it is not surprising that these companies noticed the problems and dug in and fixed the problems quickly. I like to think that my post made playlisting just a little bit better for a few million people.
This week, Google launched the beta of its music locker service where you can upload all your music to the cloud and listen to it from anywhere. According to Techcrunch, Google’s Paul Joyce revealed that the Music Beta killer feature is ‘Instant Mix,’ Google’s version of Genius playlists, where you can select a song that you like and the music manager will create a playlist based on songs that sound similar. I wondered how good this ‘killer feature’ of Music Beta really was and so I decided to try to evaluate how well Instant Mix works to create playlists.
Google’s Instant Mix, like many playlisting engines, creates a playlist of songs given a seed song. It tries to find songs that go well with the seed song. Unfortunately, there’s no solid objective measure to evaluate playlists. There’s no algorithm that we can use to say whether one playlist is better than another. A good playlist derived from a single seed will certainly have songs that sound similar to the seed, but there are many other aspects as well: the mix of the familiar and the new, surprise, emotional arc, song order, song transitions, and so on. If you are interested in the perils of playlist evaluation, check out this talk Dr. Ben Fields and I gave at ISMIR 2010: Finding a path through the jukebox. The Playlist tutorial. (Warning, it is a 300 slide deck). Adding to the difficulty in evaluating the Instant Mix is that since it generates playlists within an individual’s music collection, the universe of music that it can draw from is much smaller than a general playlisting engine such as we see with a system like Pandora. A playlist may appear to be poor because it is filled with songs that are poor matches to the seed, but in fact those songs actually may be the best matches within the individual’s music collection.
Evaluating playlists is hard. However, there is something that we can do that is fairly easy to give us an idea of how well a playlisting engine works compared to others. I call it the WTF test. It is really quite simple. You generate a playlist, and just count the number of head-scratchers in the list. If you look at a song in a playlist and say to yourself ‘How the heck did this song get in this playlist’ you bump the counter for the playlist. The higher the WTF count the worse the playlist. As a first order quality metric, I really like the WTF Test. It is easy to apply, and focuses on a critical aspect of playlist quality. If a playlist is filled with jarring transitions, leaving the listener with iPod whiplash as they are jerked through songs of vastly different styles, it is a bad playlist.
For this evaluation, I took my personal collection of music (about 7,800 tracks) and enrolled it into 3 systems; Google Music, iTunes and The Echo Nest. I then created a set of playlist using each system and counted the WTFs for each playlist. I picked seed songs based on my music taste (it is my collection of music so it seemed like a natural place to start).
I compared three systems: iTunes Genius, Google Instant Mix, and The Echo Nest playlisting API. All of them are black box algorihms, but we do know a little bit about them:
- iTunes Genius – this system seems to be a collaborative filtering algorithm driven from purchase data acquired via the iTunes music store. It may use play, skip and ratings to steer the playlisting engine. More details about the system can be found in: Smarter than Genius? Human Evaluation of Music Recommender Systems. This is a one button system – there are no user-accessible controls that affect the playlisting algorithm.
- Google Instant Mix – there is no data published on how this system works. It appears to be a hybrid system that uses collaborative filtering data along with acoustic similarity data. Since Google Music does give attribution to Gracenote, there is a possibility that some of Gracenote’s data is used in generating playlists. This is a one button system. There are no user-accessible controls that affect the playlisting algorithm.
- The Echo Nest playlist engine – this is a hybrid system that uses cultural, collaborative filtering data and acoustic data to build the playlist. The cultural data is gleaned from a deep crawl of the web. The playlisting engine takes into account artist popularity, familiarity, cultural similarity, and acoustic similarity along with a number of other attributes There are a number of controls that can be set to control the playlists: variety, adventurousness, style, mood, energy. For this evaluation, the playlist engine was configured to create playlists with relatively low variety with songs by mostly mainstream artists. The configuration of the engine was not changed once the test was started.
For this evaluation I’ve used my personal iTunes music collection of about 7,800 songs. I think it is a fairly typical music collection. It has music of a wide variety of styles. It contains music of my taste (70s progrock and other dad-core, indie and numetal), music from my kids (radio pop, musicals), some indie, jazz, and a whole bunch of Canadian music from my friend Steve. There’s also a bunch of podcasts as well. It has the usual set of metadata screwups that you see in real-life collections (3 different spellings of Björk for example). I’ve placed a listing of all the music in the collection at Paul’s Music Collection if you are interested in all of the details.
Although I’ve tried my best to be objective, I clearly have a vested interest in the outcome of this evaluation. I work for a company that has its own playlisting technology. I have friends that work for Google. I like Apple products. So feel free to be skeptical about my results. I will try to do a few things to make it clear that I did not fudge things. I’ll show screenshots of results from the 3 playlisting sources, as opposed to just listing songs. (I’m too lazy to try to fake screenshots). I’ll also give API command I used for the Echo Nest playlists so you can generate those results yourself. Still, I won’t blame the skeptics. I encourage anyone to try a similar A/B/C evaluation on their own collection so we can compare results.
For each trial, I picked a seed song, generated a 25 song playlist using each system, and counted the WTFs in each list. I show the results as screenshots from each system and I mark each WTF that I see with a red dot.
Trial #1 – Miles Davis – Kind of Blue
I don’t have a whole lot of Jazz in my collection, so I thought this would be a good test to see if a playlister could find the Jazz amidst all the other stuff.
First up is iTunes Genius
This looks like an excellent mix. All jazz artists. The most WTF results are the Blood, Sweat and Tears tracks – which is Jazz-Rock fusion, or the Norah Jones tracks which are more coffee house, but neither of these tracks rise above the WTF level. Well done iTunes! WTF score: 0
Next up is The Echo Nest.
As with iTunes, the Echo Nest playlist has no WTFs, all hardcore jazz. I’d be pretty happy with this playlist, especially considering the limited amount of Jazz in my collection. I think this playlist may even be a bit better than the iTunes playlist. It is a bit more hardcore Jazz. If you are listening to Miles Davis, Norah Jones may not be for you. Well done Echo Nest. WTF score: 0
If you want to generate a similar playlist via our api use this API command:
http://developer.echonest.com/api/v4/playlist/static?api_key=3YDUQHGT9ZVUBFBR0&format=json &limit=true&song_id=SOAQMYC12A8C13A0A8 &type=song-radio&bucket=id%3ACAQHGXM12FDF53542C &variety=.12&artist_min_hotttnesss=.4
Next up is google:
I’ve marked the playlist with red dots on the songs that I consider to be WTF songs. There are 18(!) songs on this 25 song playlist that are not justifiable. There’s electronica, rock, folk, Victorian era brass band and Coldplay. Yes, that’s right, there’s Coldplay on a Miles Davis playlist. WTF score: 18
After Trial 1 Scores are: iTunes: 0 WTFs, The Echo Nest 0 WTFs, Google Music: 18 WTFs
Trial #2 – Lady Gaga – Bad Romance
First up is iTunes:
Next up: The Echo Nest
Next up, Google Instant Mix
Google’s Instant Mix for Lady Gaga’s Bad Romance seems filled with non sequitur. Tracks by Dave Brubeck (cool jazz), Maynard Ferguson (big band jazz), are mixed in with tracks by Ice Cube and They Might be Giants. The most appropriate track in the playlist is a 20 year old track by Madonna. I think I was pretty lenient in counting WTFs on this one. Even then, it scores pretty poorly. WTF Score: 13
After Trial 2 Scores are: iTunes: 2 WTFs, The Echo Nest 0 WTFs, Google Music: 31WTFs
Trial #3 – The Nice – Rondo
First up: iTunes:
Next up is The Nest:
Next up is Google Instant Mix:
I would not like to listen to this playlist. It has a number songs that are just too far out. ABBA, Simon & Garfunkel, are WTF enough, but this playlist takes WTF three steps further. First offense, including a song with the same title more than once. This playlist has two versions of ‘Side A-Popcorn’. That’s a no-no in playlisting (except for cover playlists). Next offense is the song ’I think I love you’ by the Partridge family. This track was not in my collection. It was one of the free tracks that Google gave me when I signed up. 70s bubblegum pop doesn’t belong on this list. However,as bad as The Partridge family song is, it is not the worst track on the playlist. That award goes to FM 2.0: The future of Internet Radio’. Yep, Instant Mix decided that we should conclude a prog rock playlist with an hour long panel about the future of online music. That’s a big WTF. I can’t imagine what algorithm would have led to that choice. Google really deserves extra WTF points for these gaffes, but I’ll be kind. WTF Score: 11
After Trial 3 Scores are: iTunes: 2 WTFs, The Echo Nest 0 WTFs, Google Music: 42WTFs
Trial #4 – Kraftwerk – Autobahn
I don’t have too much electronica, but I like to listen to it, especially when I’m working. Let’s try a playlist based on the group that started it all.
First up, iTunes.
iTunes nails it here. Not a bad track. Perfect playlist for programming. Again, well done iTunes. WTF Score: 0
Next up, The Echo Nest
Another solid playlist, No WTFs. It is a bit more vocal heavy than the iTunes playlist. I think I prefer the iTunes version a bit more because of that. Still, nothing to complain about here: WTF Score: 0
Next Up Google
After listening to this playlist, I am starting to wonder if Google is just messing with us. They could do so much better by selecting songs at random within a top level genre than what they are doing now. This playlist only has 6 songs that can be considered OK, the rest are totally WTF. WTF Score: 18
After Trial 4 Scores are: iTunes: 2 WTFs, The Echo Nest 0 WTFs, Google Music: 60 WTFs
Trial #5 The Beatles – Polythene Pam
For the last trial I chose the song Polythene Pam by The Beatles. It is at the core of the amazing bit on side two of Abbey Road. The zenith of the Beatles music are (IMHO) the opening chords to this song. Lets see how everyone does:
First up: iTunes
iTunes gets a bit WTF here. They can’t offer any recommendations based upon this song. This is totally puzzling to me since The Beatles have been available in the iTunes store for quite a while now. I tried to generate playlists seeded with many different Beatles songs and was not able to generate one playlist. Totally WTF. I think that not being able to generate a playlist for any Beatles song as seed should be worth at least 10 WTF points. WTF Score: 10
Next Up: The Echo Nest
No worries with The Echo Nest playlist. Probably not the most creative playlist, but quite serviceable. WTF Score: 0
Next up Google
Instant Mix scores better on this playlist than it has on the other four. That’s not because I think they did a better job on this playlist, it is just that since the Beatles cover such a wide range of music styles, it is not hard to make a justification for just about any song. Still, I do like the variety in this playlist. There are just two WTFs on this playlist. WTF Score: 2.
After Trial 5 Scores are: iTunes: 12 WTFs, The Echo Nest 0 WTFs, Google Music: 62 WTFs
(lower scores are better)
I learned quite a bit during this evaluation. First of all, Apple Genius is actually quite good. The last time I took a close look at iTunes Genius was 3 years ago. It was generating pretty poor recommendations. Today, however, Genius is generating reliable recommendations for just about any track I could throw at it, with the notable exception of Beatles tracks.
I was also quite pleased to see how well The Echo Nest playlister performed. Our playlist engine is designed to work with extremely large collections (10million tracks) or with personal sized collections. It has lots of options to allow you to control all sorts of aspects of the playlisting. I was glad to see that even when operating in a very constrained situation of a single seed song, with no user feedback it performed well. I am certainly not an unbiased observer, so I hope that anyone who cares enough about this stuff will try to create their own playlists with The Echo Nest API and make their own judgements. The API docs are here: The Echo Nest Playlist API.
However, the biggest surprise of all in this evaluation is how poorly Google’s Instant Mix performed. Nearly half of all songs in Instant Mix playlists were head scratchers – songs that just didn’t belong in the playlist. These playlists were not usable. It is a bit of a puzzle as to why the playlists are so bad considering all of the smart people at Google. Google does say that this release is a Beta, so we can give them a little leeway here. And I certainly wouldn’t count Google out here. They are data kings, and once the data starts rolling from millions of users, you can bet that their playlists will improve over time, just like Apple’s did. Still, when Paul Joyce said that the Music Beta killer feature is ‘Instant Mix’, I wonder if perhaps what he meant to say was “the feature that kills Google Music is ‘Instant Mix’.”
Here at the Echo Nest just added a new feature to our APIs called Personal Catalogs. This feature lets you make all of the Echo Nest features work in your own world of music. With Personal Catalogs (PCs) you can define application or user specific catalogs (in terms of artists or songs) and then use these catalogs to drive the behavior of other Echo Nest APIs. PCs open the door to all sorts of custom apps built on the Echo Nest platform. Here are some examples:
Create better genius-style playlists – With PCs I can create a catalog that contains all of the songs in my iTunes collection. I can then use this catalog with the Echo Nest Playlist API to generate interesting playlists based upon my own personal collection. I can create a playlist of my favorite, most danceable songs for a party, or I can create a playlist of slow, low energy, jazz songs for late night reading music.
Create hyper-targeted recommendations - With PCs I can make a catalog of artists and then use the artist/similar APIs to generate recommendations within this catalog. For instance, I could create an artist catalog of all the bands that are playing this weekend in Boston and then create Music Hack Day recommender that tells each visitor to Boston what bands they should see in Boston based upon their musical tastes.
Get info on lots of stuff – people often ask questions about their whole music collection. Like, ‘what are all the songs that I have that are at 113 BPM?‘, or ‘what are the softest songs?’ Previously, to answer these sorts of questions, you’d have to query our APIs one song at a time – a rather tedious and potentially lengthy operation (if you had, say, 10K tracks). With PCs, you can make a single catalog for all of your tracks and then make bulk queries against this catalog. Once you’ve created the catalog, it is very quick to read back all the tempos in your collection.
Represent your music taste – since a Personal Catalog can contain info such as playcounts, skips, and ratings for all of the artists and songs in your collection, it can serve as an excellent proxy to your music taste. Current and soon to be released APIs will use personal catalogs as a representation of your taste to give you personalized results. Playlisting, artist similarity, music recommendations all personalized based on you listening history.
These examples just scratch the surface. We hope to see lots of novel applications of Personal Catalogs. Check out the APIs, and start writing some code.
Yesterday, Steve Jobs reminded us that it was less than 10 years ago when Apple announced the first iPod which could put a thousand songs in your pocket. With the emergence of cloud-based music services like Spotify and Rhapsody, we can now have a virtually endless supply of music in our pocket. The ’bottomless iPod’ will have as big an effect on how we listen to music as the original iPod had back in 2001. But with millions of songs to chose from, we will need help finding music that we want to hear. Shuffle play won’t work when we have a million songs to chose from. We will need new tools that help us manage our listening experience. I’m convinced that one of these tools will be intelligent automatic playlisting.
This weekend at the Music Hack Day London, The Echo Nest is releasing the first version of our new Playlisting API. The Playlisting API lets developers construct playlists based on a flexible set of artist/song selection and sorting rules. The Echo Nest has deep data about millions of artists and songs. We know how popular Lady Gaga is, we know the tempo of every one of her songs, we know other artists that sound similar to her, we know where she’s from, we know what words people use to describe her music (‘dance pop’, ‘club’, ‘party music’, ‘female’, ‘diva’ ). With the Playlisting API we can use this data to select music and arrange it in all sorts of flexible ways – from very simple Pandora radio style playlists of similar sounding songs to elaborate playlists drawing on a wide range of parameters. Here are some examples of the types of playlists you can construct with the API:
- Similar artist radio – generate a playlist of songs by similar artists
- Jogging playlist – generate a playlist of 80s power pop with a tempo between 120 and 130 BPM, but never ever play Bon Jovi
- London Music Hack Day Playlist -generate a playlist of electronic and techno music by unknown artists near London, order the tracks by tempo from slow to fast
- Tomorrow’s top 40 – play the hottest songs by pop artists with low familiarity that are starting to get hottt
- Heavy Metal Radio – A DMCA-Compliant radio stream of nothing but heavy metal
We have also provide a dynamic playlisting API that will allow for the creation of playlists that adapt based upon skipping and rating behavior of the listener.
I’m about to jump on a plane for the Music Hackday London where we will be demonstrating this new API and some cool apps that have already been built upon it. I’m hoping to see a few apps emerge from this Music Hack Day that use the new API. More info about the APIs and how you can use it to do all sorts of fun things will be forthcoming. For the motivated dive into the APIs right now.
Ben Fields and I have just put the finishing touches on our playlisting tutorial for ISMIR. Everything you could want to know about playlists. As one of the founders of a well known music intelligence company once said: Take the fun out of music and read Paul’s slides …
Don’t count the pre-fab smart playlists that come with iTunes (like 90′s music, Recently Added, My Top Rated, etc.). Once you’ve counted up your playlists, take the poll:
I was playing with the engine this weekend, writing some rules to make novelty playlists to test the limits of the engine. I started with rules typical for a similar-artist playlist: 15 songs long, filled with songs by artists similar to a seed artist (in this case Weezer), the first and last song must be by the seed artist, and no two consecutive songs can be by the same artist. Simple enough, but then I added two more rules to turn this into a novelty playlist that would be very hard for a human to make. See if you can guess what the two rules are. I think one of the rules is pretty obvious, but the second is a bit more subtle. Post your guesses in the comments.
0 Tripping Down the Freeway - Weezer 1 Yer All I've Got Ttonight - The Smashing Pumpkins 2 The Most Beautiful Things - Jimmy Eat World 3 Someday You Will Be Loved - Death Cab For Cutie 4 Don't Make Me Prove It - Veruca Salt 5 The Sacred And Profane - Smashing Pumpkins, The 6 Everything Is Alright - Motion City Soundtrack 7 The Ego's Last Stand - The Flaming Lips 8 Don't Believe A Word - Third Eye Blind 9 Don's Gone Columbia - Teenage Fanclub 10 Alone + Easy Target - Foo Fighters 11 The Houses Of Roofs - Biffy Clyro 12 Santa Has a Mullet - Nerf Herder 13 Turtleneck Coverup - Ozma 14 Perfect Situation - Weezer
Here’s another playlist – with a different set of two novelty rules, with a seed artist of Led Zeppelin. Again, if you can guess the rules, post a comment.
0 El Niño - Jethro Tull
1 Cheater - Uriah Heep
2 Hot Dog - Led Zeppelin
3 One Thing - Lynyrd Skynyrd
4 Nightmare - Black Sabbath
5 Ezy Ryder - The Jimi Hendrix Experience
6 Soulshine - Govt Mule
7 The Gypsy - Deep Purple
8 I'll Wait - Van Halen
9 Slow Down - Ozzy Osbourne
10 Civil War - Guns N' Roses
11 One Rainy Wish - Jimi Hendrix
12 Overture (Live) - Grand Funk Railroad
13 Larger Than Life - Gov'T Mule
People expect human DJs to make better playlists:
The survey asks people to try to identify the origin of a playlist (human expert, algorithm or random) and also rate each playlist. We can look at the ratings people give to playlists based on what they think the playlist origin is to get an idea of people’s attitudes toward human vs. algorithm creation.
Predicted Origin Rating ---------------- ------ Human expert 3.4 Algorithm 2.7 Random 2.1
We see that people expect humans to create better playlists than algorithms and that algorithms should give better playlists than random numbers. Not a surprising result.
Human DJs don’t necessarily make better playlists:
Now lets look at how people rated playlists based on the actual origin of the playlists:
Actual Origin Rating ------------- ------ Human expert 2.5 Algorithm 2.7 Random 2.6
These results are rather surprising. Algorithmic playlists are rated highest, while human-expert-created playlists are rated lowest, even lower than those created by the random number generator. There are lots of caveats here, I haven’t done any significance tests yet to see if the differences here really matter, the survey size is still rather small, and the survey doesn’t present real-world playlist listening conditions, etc. Nevertheless, the results are intriguing.
I’d like to collect more survey data to flesh out these results. So if you haven’t already, please take the survey:
The tradition of the old-style Radio DJ continues on Internet Radio sites like Radio Paradise. RP founder/DJ Bill Goldsmith says of Radio Paradise: “Our specialty is taking a diverse assortment of songs and making them flow together in a way that makes sense harmonically, rhythmically, and lyrically — an art that, to us, is the very essence of radio.” Anyone who has listened to Radio Paradise will come to appreciate the immense value that a professionally curated playlist brings to the listening experience.
I wish I could put Bill Goldsmith in my iPod and have him craft personalized playlists for me - playlists that make sense harmonically, rhythmically and lyrically, and customized to my music taste, mood and context . That, of course, will never happen. Instead I’m going to rely on computer algorithms to generate my playlists. But how good are computer generated playlists? Can a computer really generate playlists as good as Bill Goldsmith, with his decades of knowledge about good music and his understanding of how to fit songs together?
To help answer this question, I’ve created a Playlist Survey – that will collect information about the quality of playlists generated by a human expert, a computer algorithm and a random number generator. The survey presents a set of playlists and the subject rates each playlist in terms of its quality and also tries to guess whether the playlist was created by a human expert, a computer algorithm or was generated at random.
Bill Goldsmith and Radio Paradise have graciously contributed 18 months of historical playlist data from Radio Paradise to serve as the expert playlist data. That’s nearly 50,000 playlists and a quarter million song plays spread over nearly 7,000 different tracks.
The Playlist Survey also servers as a Radio DJ Turing test. Can a computer algorithm (or a random number generator for that matter) create playlists that people will think are created by a living and breathing music expert? What will it mean, for instance, if we learn that people really can’t tell the difference between expert playlists and shuffle play?
Ben Fields and I will offer the results of this Playlist when we present Finding a path through the Jukebox – The Playlist Tutorial – at ISMIR 2010 in Utrecth in August. I’ll also follow up with detailed posts about the results here in this blog after the conference. I invite all of my readers to spend 10 to 15 minutes to take The Playlist Survey. Your efforts will help researchers better understand what makes a good playlist.