Posts Tagged familiarity
[tweetmeme source= ‘plamere’ only_single=false]
TL;DR; I built a game called Name Dropper that tests your knowledge of music artists.
One bit of data that we provide via our web APIs is Artist Familiarity. This is a number between 0 and 1 that indicates how likely it is that someone has heard of that artists. There’s no absolute right answer of course – who can really tell if Lady Gaga is more well known than Barbara Streisand or whether Elvis is more well known than Madonna. But we can certainly say that The Beatles are more well known, in general, than Justin Bieber.
To make sure our familiarity scores are good, we have a Q/A process where a person knowledgeable in music ranks our familiarity score by scanning through a list of artists ordered in descending familiarity until they start finding artists that they don’t recognize. The further they get into the list, the better the list is. We can use this scoring technique to rank multiple different familiarity algorithms quickly and accurately.
One thing I noticed, is that not only could we tell how good our familiarity score was with this technique, this also gives a good indication of how well the tester knows music. The further a tester gets into a list before they can’t recognize artists, the more they tend to know about music. This insight led me to create a new game: The Name Dropper.
The Name Dropper is a simple game. You are presented with a list of dozen artist names. One name is a fake, the rest are real.
If you find the fake, you go onto the next round, but if you get fooled, the game is over. At first, it is pretty easy to spot the fakes, but each round gets a little harder, and sooner or later you’ll reach the point where you are not sure, and you’ll have to guess. I think a person’s score is fairly representative of how broad their knowledge of music artists are.
The biggest technical challenge in building the application was coming up with a credible fake artist name generator. I could have used Brian’s list of fake names – but it was more fun trying to build one myself. I think it works pretty well. I really can’t share how it works since that could give folks a hint as to what a fake name might look like and skew scores (I’m sure it helps boost my own scores by a few points). The really nifty thing about this game is it is a game-with-a-purpose. With this game I can collect all sorts of data about artist familiarity and use the data to help improve our algorithms.
So go ahead, give the Name Dropper a try and see if you can push me out of the top spot on the leaderboard:
At the Echo Nest we have lots of data about millions of artists. It can be interesting to see what kind of patterns can be extracted from this data. Tim G suggested an experiment where we see if we can find artists that are on the verge of breaking out by looking at some of this data. I tried a simple experiment to see what we could find. I started with two pieces of data for each artist.
- Familiarity – this corresponds to how well known in artist is. You can look at familiarity as the likelihood that any person selected at random will have heard of the artist. Beatles have a familiarity close to 1, while a band like ‘Hot Rod Shopping Cart’ has a familiarity close to zero.
- Hotttnesss – this corresponds to how much buzz the artist is getting right now. This is derived from many sources, including mentions on the web, mentions in music blogs, music reviews, play counts, etc.
I collected these 2 pieces of data for 130K+ artists and plotted them. The following plot shows the results. The x-axis is familiarity and the y-axis is hotttnesss. Clearly there’s a correlation between hotttnesss and familiarity. Familiar artists tend to be hotter than non-familiar artists. At the top right are the Billboard chart toppers like Kanye West and Taylor Swift, while at the bottom left are artists that you’ve probably never heard of like Mystery Fluid. We can use this plot to find the up and coming artists as well as the popular artists that are cooling off. Outliers to the left and above the main diagonal are the rising stars (their hotttnesss exceeds their familiarity). Here we see artists like Willie the Kid, Ben*Jammin and ラディカルズ (a.k.a. Rock the Queen). While artists below the diagonal are well known, but no longer hot. Here we see artists like Simon & Garfunkel, Jimmy Page and Ziggy Stardust. Note that this is not a perfect science – for instance, it is not clear how to rate the familiarity for artist collaborations – you may know James Brown and you may know Luciano Pavarotti, but you may not be familiar with the Brown/Pavarotti collaboration – what should the familiarity of this collaboration be? the average of the two artists, or should it be related to how well known the collaboration itself is? Hotttnesss can also be tricky with extremely unfamiliar artists. If a Hot Rod Shopping Cart track gets 100 plays it could substantially increase the band’s hotttnesss (‘Hey! We are twice as popular as we were yesterday!’)
Despite these types of confounding factors, the familiarity / hotttnesss model still seems to be a good way to start exploring for new, potentially unsigned acts that are on the verge of breaking out. To select the artists, I did the simplest thing that could possibly work: I created a ‘break-out’ score which is simply ratio of hotttnesss to familiarity. Artists that have a high hotttnesss as compared to their familiarity are getting a lot of web buzz but are still relatively unknown. I calculated this break-out score for all artists and used it to select the top 1000 artists with break-out potential, as well as the bottom 1000 artists (the fade-aways). Here’s a plot showing the two categories:
Here are 10 artists with high break-out scores that might be worth checking out:
- Ben*Jammin – German pop, with 249 Last.fm listeners with an awesome youtube video (really, you have to watch it)
- Lord Vampyr’s Shadowsreign – 32 Last.fm listeners – I’m not sure whether they are being serious or not in this video.
- Waking Vision Trio – 429 Last.fm Listeners – on youtube
- The Bart Crow Band – alt-country – 3K last.fm listeners – youtube
- Urine Festival – 500 last.fm listeners – really, not for the faint of heart – youtube
- Fictivision vs Phynn – 250 Last.fm listeners – trance – youtube
- korablove – 1,500 Last.fm listeners – minimal, deep house – youtube
- Deelstylistic – 1,800 Last.fm listeners – r&b – youtube
- Luke Doucet and the White Falcon – 900 Last.fm listeners – youtube
- i-sHiNe – 1,700 Last.fm listeners – on youtube