Hottt or Nottt?

At the Echo Nest we have lots of data about millions of artists.  It can be interesting to see what kind of patterns can be extracted from this data.  Tim G suggested an experiment where we see if we can find artists that are on the verge of breaking out by looking at some of this data.   I tried a simple experiment to see what we could find.   I started with two pieces of data for each artist.

  1. Familiarity – this corresponds to how well known in artist is.  You can look at familiarity as the likelihood that any person selected at random will have heard of the artist.  Beatles have a familiarity close to 1, while a band like ‘Hot Rod Shopping Cart’ has a familiarity close to zero.
  2. Hotttnesss – this corresponds to how much buzz the artist is getting right now. This is derived from many sources, including mentions on the web, mentions in music blogs, music reviews, play counts, etc.

I collected these 2 pieces of data for 130K+ artists and plotted them.  The following plot shows the results.  The x-axis is familiarity and the y-axis is hotttnesss.   Clearly there’s a correlation between hotttnesss and familiarity.  Familiar artists tend to be hotter than non-familiar artists.  At the top right are the Billboard chart toppers like Kanye West and Taylor Swift, while at the bottom left are artists that you’ve probably never heard of like Mystery Fluid.    We can use this plot to find the up and coming artists as well as the popular artists that are cooling off.  Outliers to the left and  above the main diagonal are the rising stars (their hotttnesss exceeds their familiarity).  Here we see artists like Willie the Kid, Ben*Jammin and  ラディカルズ (a.k.a. Rock the Queen).  While artists below the diagonal are well known, but no longer hot. Here we see artists like Simon & Garfunkel, Jimmy Page and Ziggy Stardust.  Note that this is not a perfect science – for instance, it is not clear how to rate the familiarity for artist collaborations – you may know James Brown and you may know Luciano Pavarotti, but you may not be familiar with the Brown/Pavarotti collaboration – what should the familiarity of this collaboration be? the average of the two artists, or should it be related to how well known the collaboration itself is? Hotttnesss can also be tricky with extremely unfamiliar artists.  If a Hot Rod Shopping Cart track gets 100 plays it could substantially  increase the band’s hotttnesss (‘Hey! We are twice as popular as we were yesterday!’)

Despite these types of confounding factors, the familiarity / hotttnesss model still seems to be a good way to start exploring for new, potentially unsigned acts that are on the verge of breaking out.    To select the artists, I did the simplest thing that could possibly work: I created a ‘break-out’ score which is simply ratio of hotttnesss to familiarity.  Artists that have a high hotttnesss as compared to their familiarity are getting a lot of web buzz but are still relatively unknown.  I calculated this break-out score for all artists and used it to select the top 1000 artists with break-out potential, as well as the bottom 1000 artists (the fade-aways).  Here’s a plot showing the two categories:

Here are 10 artists with high break-out scores that might be worth checking out:

,

  1. #1 by Thierry BM on December 9, 2009 - 1:29 pm

    Awesome! is there any way to track that in time, e.g. see how a well-known and maybe fading artist has moved from one side of the plane to the other one over 10 years? Then you measure how much the artist have reinvented themselves by how many times they have crossed the diagonal… maybe…

  2. #2 by Rob on December 9, 2009 - 7:16 pm

    Perhaps the two sides of this graph should form the input into outlierFM’s upcoming 24/7 radio…interested in getting me a data stream?

  3. #3 by Dean on December 10, 2009 - 4:41 am

    hey good post! I’m just curious about how the familiarity is determined for each artist?

  4. #4 by brian on December 10, 2009 - 3:41 pm

    what paul doesn’t say because he thinks most people already know is that both familiarity and hotttnesss are available for free in the Echo Nest API available on http://developer.echonest.com/ once you register for a key.

  5. #5 by Dean on December 11, 2009 - 4:09 am

    Thanks brian. I know that information is available in the api. What I wonder is how it is calculated because the api seems expose no details about it. The way how it be measured may reveal the correlation with hotness. or did I miss sth in the EchoNest doc?

    • #6 by Paul on December 11, 2009 - 9:12 am

      Dean: We don’t publish the exact details of how we determine familiarity or hotttnesss – we are constantly refining our algorithms and sources – but some major components for familiarity is overall playcounts (from numerous music sites), sales rank, chart appearances and appearance on the web.

  6. #7 by Dean on December 13, 2009 - 12:12 am

    Paul, appreciate for the information. So the familiarity seems like kind of long-term hotness.