Posts Tagged charts
Every week, thousands of artists release albums on Spotify. Sifting through all this new music to find good stuff to listen to can be hard. Luckily, there are lots of tools from New Music Tuesday playlists to the Spotify Viral 50 to help us find the needles in the proverbial haystack of new music. However, most of these tools tend to surface up new music by artists that have been around for a while. For instance, the top artist on Spotify Viral 50 as I write this is Jeremih who has been on the charts for five years. The top of New Music Tuesday right now is Mumford & Sons who’ve been recording for at least eight years.
I’m interested in finding music by the freshest artists – artists that are at the very beginning of their recording careers. To that end, I’ve built a new chart called ‘The Fresh 40’ that shows the top albums by the freshest artists. To build The Fresh 40 I scour through all of the albums that have been released in the last two weeks on Spotify (on average that’s about 30 thousand albums), and find the albums that are the very first album release for its artist. I then rank each album by a weighted combination of the number of followers the artist has on Spotify and the popularity of the artist and album (which is related to Spotify track plays). The result is a chart of the top 40 most popular fresh artists.
The Fresh 40 updates every day and shows all the salient info including the rank, yesterday’s rank, the overall score, artist followers, artist popularity, album popularity and the number of days that the album has been on the chart. Since an album can only be on the chart for 15 days, there’s quite a bit of change from day to day.
If you are interested in finding music by the very newest artists on Spotify, you might be interested in The Fresh 40. Give the chart a look.
There’s a strong connection between music and memory. Whenever I here the song Lovin You by Minnie Riperton, I’m instantly transported back to 1975 when I spent the summer apprenticed to Tom, my future brother-in-law, fixing electronic organs. I was 15, Tom was 22 and super cool. He had a business (New Hampshire Organ Service) and he had a van with an 8-track player and an FM radio (a rarity in 1975). As we drove between repairs across rural New Hampshire we’d pass the time by listening to the radio. Now, when I hear those radio songs from 1975 it is like I’m sitting in that van again.
Music can be like a time machine. Transporting us to different times in our lives. I was interested in exploring this a bit more. Inspired by @realtimewwii which gives a day-by-day account of World War II, I created a set of dynamically updating Spotify playlists that follow the charts week-by-week.
For example there’s the 50 Years Ago in Music playlist that contains the top 100 or songs that were on the chart 50 years ago. As I write this on April 12, 2015, this playlist is showing the top songs for the week of April 12, 1965.
The music on this playlist sends me back to when I was 5 years old listening to music on our AM radio in the kitchen in the morning while eating breakfast.
If you follow this playlist you’ll be able to re-create what it was like to listen to music 50 years ago. If the mid-sixties doesn’t speak to you musically, there are some other playlists that you can try.
There’s 40 Years Ago in Music that brings me back to 1975 on the road with Tom.
There’s 30 years Ago in Music which is currently playing music from the mid-80s like Madonna and Phil Collins.
There’s 20 Years Ago in Music currently playing music from the mid-90s:
10 Years Ago in Music plays the music that was on the radio when Spotify was just a gleam in Daniel’s eye.
5 Years Ago in Music – the playlist of @echonest in its heyday.
In yesterday’s post about the Hot Songs of Summer 2013, I noted that some songs were attracting a very passionate fan base. In particular, the song Miss Movin’ On by Fifth Harmony was an extreme outlier, attracting more than twice the number of plays per listener than any other song.
Based on this data I suggested that the Fifth Harmony was going places – such high passion among their listeners was surely indicative of future success. But now I am not so sure. Shortly after I made that post I learned that our crack data team here at The Echo Nest were already on to some Fifth Harmony shenanigans. Yes, Fifth Harmony is getting lots of plays, but many of these plays are due to an orchestrated campaign. Fifth Harmony fans are encouraged to go to music streaming sites such as Spotify and Rdio and stream Miss Movin’ On (aka MMO) 24/7. Here are some examples:
There are a number of twitter accounts that are prompting such MMO plays. The campaign seems to be working. 5H is moving up in the charts. Just take a look at the top songs on Rdio this week, Miss Movin’ On is number two on the list:
But what effect is this campaign really having on Fifth Harmony? Perhaps Fifth Harmony’s position on the charts is a natural outcome of their appeal, and is not a result of a small number of fans that stream MMO 24/7 with their computers and iPhones on mute. Can we see the effect that The Harmonizers are having? And if so, how substantial is this effect? The answer lies in the data, so that’s where we will go.
Can we see the effect of the Harmonizers?
The first thing to do is to take a look at the listener play data for MMO and compare it to other songs to see if there are any tell-tale signs of a shilling campaign. To do this, I selected 9 other songs with similar number of fans that appeal to a similar demographic as MMO. For each of these songs I ordered the listeners in descending play order (i.e. the first listener is the listener that has played the song the most) and plotted the number of plays per listener for the 10 songs.
As you can see, 9 out of 10 songs follow a similar pattern. The top listeners of a song have around a thousand plays. As we get deeper into the listener ranks, the number of plays per listener drops off at a very predictable rate. The one exception is Fifth Harmony’s Miss Movin’ On. The effect of the Harmonizers is clearly seen. The top plays are skewed to greatly inflate the total number of plays by two full orders of magnitude. We can also see that the number of listeners that are significantly skewing the data is relatively small. Beyond the top 200 most active listeners (less than 0.5 % of the Fifth Harmony listeners in the sample), the listening pattern for MMO falls in line with the rest of the songs. It is pretty clear that the Harmonizers are really having an effect on the number of plays. It is also clear that we can automate the detection of such shilling by looking for such non-standard listening patterns.
Update – a reader has asked that I include One Direction’s Best Song Ever on the plot. You can find it here.
How big of an impact do the Harmonizers have on the overall play count?
The Harmonizers are having a huge impact. 80% of all track plays of Miss Movin’ On are concentrated into just the top 1% of listeners. Compare that to the other 9 tracks in our sample:
Percentage of listeners that account for 80% of all plays
|Fifth Harmony – Miss Movin’ On||1.0|
|Lorde – Royals||14.0|
|Karmin – Acapella||16.0|
|Anna Kendrick – Cups||17.0|
|Taylor Swift – 22||14.0|
|Icona Pop – I love it||15.0|
|Birdy – Skinny Love||25.0|
|Lana Del Rey – Summertime Sadness||15.0|
|Christina Perri – A Thousand Years||21.0|
|Krewella – Alive||17.0|
A plot of this data makes the difference quite clear:
I estimate that at least 75% of all plays of Miss Movin’ On are overplays that are a direct result of the Harmonizer campaign.
What effect does the Fifth Harmony campaign have on chart position?
It is pretty easy to back out the overplays by finding another song that has a similarly-shaped plays vs listener rank curve once we get beyond past the first 1% of listeners (the ones that are overplaying the track). For instance, Karmin’s Acapella has a similar mid-tail and long-tail listener curve and has a similar audience size making it a good proxy. It’s Summer Time rank was 378. Based on this proxy, MMO’s real rank should be dropped from 45 to around 375. This means that a few hundred committed fans were able to move a song up more than 300 positions on the chart.
The bottom line here is that an organized campaign for very little cost has harnessed the most passionate fans to substantially bolster the apparent popularity of an artist, making the artist appear to be about 4 times more popular than it really is.
What does this all mean for music services?
Whenever there’s a high-stakes metric like chart position some people will try to find a way to game the system to get their stuff to the top of the chart. Twenty years ago, the only way to game the charts was either by spending lots of money buying copies of your record to boost the sales figures, or bribe radio DJs to play your songs to boost radio airplay. With today’s music subscription services, there’s a much easier way to game the system. Fans and shills need to simple play a song on autorepeat across a a few hundred accounts to boost the chart position of a song. Fifth Harmony proves that if you have a small, but committed fan base, you can radically boost your chart position for very little cost.
Obviously, a music service doesn’t like this. First, the music service has to pay for all those streams, even if no one is actually listening to them. Second, when a song gets to the top of a chart through shilling and promotion campaigns, it reduces the listening enjoyment for those who use the charts to find music. Instead of finding a new song that got to the top of the chart based solely (or at least mostly) on merit, they are listening to a song that is a product of a promotion machine. Finally, music services that rely on user play data to generate music recommendations via collaborative filtering have a significant problem trying to make sure that fake plays don’t improperly influence their recommendations.
So what can be done to limit the damage to music services? As we’ve seen, it is pretty easy to detect when a song is being overplayed via a campaign and these overplays can be removed. Perhaps even simpler though is to rely on metrics that are less easily gamed – such as the number of fans a song has instead of the total number of plays. For a music subscription service that has a credit card number associated with each user account, the number of fans a song has is a much harder metric to hack.
What does this say about Fifth Harmony fans ?
I am always happy when I see people getting excited about music. The Fifth Harmony fans are really excited about Miss Movin’ On, the tour and the upcoming album. Its great that the fans are so invested in the music that they want to help the band be successful. That’s what being a fan is all about. But I hope they’ll avoid trying to take their band to the top by a shortcut. As they say, it’s a long way to the top if you want to rock n’ roll. Let Fifth Harmony earn their position at the top of charts, don’t give them a free ride.
And finally, a special message to music labels or promoters: If you are trying to game the music charts by enlisting hundreds of pre-teens and teens to continuously stream your one song: screw you.
Update – I’ve received **lots** of feedback from Harmonizers – thanks. A common theme among this feedback is that the fan activities and organization really are a grassroots movement, and there really is no input from the labels. Many took umbrage with my suspicions that the label was pulling the strings. I remain suspicious, but less so than before. My parting ‘screw you’ comment was in no way directed at the 5H fans, it was reserved for the mythical music label marketeer who I imagined was pulling the strings. I’m hoping to dig in a bit deeper to understand the machinery behind the 5H fan movement. Expect a follow up article soon.
[tweetmeme source=”plamere” only_single=false] I’ve been reading all my books lately using Kindle for iPhone. It is a great way to read – and having a library of books in my pocket at all times means I’m never without a book. One feature of the Kindle software is called Whispersync. It keeps track of where you are in a book so that if you switch devices (from an iPhone to a Kindle or an iPad or desktop), you can pick up exactly where you left off. Kindle also stores any bookmarks, notes, highlights, or similar markings you make in the cloud so they can be shared across devices. Whispersync is a useful feature for readers, but it is also a goldmine of data for Amazon. With Whispersync data from millions of Kindle readers Amazon can learn not just what we are reading but how we are reading. In brick-and-mortar bookstore days, the only thing a bookseller, author or publisher could really know about a book was how many copies it sold. But now with the Whispersync Amazon can get learn all sorts of things about how we are reading. With the insights that they gain from this data, they will, no doubt, find better ways to help people find the books they like to read.
I hope Amazon aggregates their Whispersync data and give us some Last.fm-style charts about how people are reading. Some charts I’d like to see:
- Most Abandoned – the books and/or authors that are most frequently left unfinished. What book is the most abandoned book of all time? (My money is on ‘A Brief History of Time’) A related metric – for any particular book where is it most frequently abandoned? (I’ve heard of dozens of people who never got past ‘The Council of Elrond’ chapter in LOTR).
- Pageturner – the top books ordered by average number of words read per reading session. Does the average Harry Potter fan read more of the book in one sitting than the average Twilight fan?
- Burning the midnight oil – books that keep people up late at night.
- Read Speed – which books/authors/genres have the lowest word-per-minute average reading rate? Do readers of Glenn Beck read faster or slower than readers of Jon Stewart?
- Most Re-read – which books are read over and over again? A related metric – which are the most re-read passages? Is it when Frodo claims the ring, or when Bella almost gets hit by a car?
- Mystery cheats – which books have their last chapter read before other chapters.
- Valuable reference – which books are not read in order, but are visited very frequently? (I’ve not read my Python in a nutshell book from cover to cover, but I visit it almost every day).
- Biggest Slogs – the books that take the longest to read.
- Back to the start – Books that are most frequently re-read immediately after they are finished.
- Page shufflers – books that most often send their readers to the glossary, dictionary, map or the elaborate family tree. (xkcd offers some insights)
- Trophy Books – books that are most frequently purchased, but never actually read.
- Dishonest rater – books that most frequently rated highly by readers who never actually finished reading the book
- Most efficient language – the average time to read books by language. Do native Italians read ‘Il nome della rosa‘ faster than native English speakers can read ‘The name of the rose‘?
- Most attempts – which books are restarted most frequently? (It took me 4 attempts to get through Cryptonomicon, but when I did I really enjoyed it).
- A turn for the worse – which books are most frequently abandoned in the last third of the book? These are the books that go bad.
- Never at night – books that are read less in the dark than others.
- Entertainment value – the books with the lowest overall cost per hour of reading (including all re-reads)
Whispersync is to books as the audioscrobbler is to music. It is an implicit way to track what you are really paying attention to. The data from Whispersync will give us new insights into how people really read books. A chart that shows that the most abandoned author is James Patterson may steer readers away from Patterson and toward books by better authors. I’d rather not turn to the New York Times Best Seller list to decide what to read. I want to see the Amazon Most Frequently Finished book list instead.
Worth checking out: Normalisr
Last week, on the Hype machine blog, Anthony indicated his increasing frustration in how easily charts could be manipulated – Anthony wanted a better way, one that was transparent, and gave more influence to the influential. Anthony’s solution was to create a twitter chart that is based on the twittering activity of Hype Machine songs. In this new chart Twitterers with more followers have more influence than those with few.
A number of commenters on Anthony’s blog pointed out how it would be easy for a single very popular twitter user to influence the charts. And that is exactly what Erick Schonfeld of TechCrunch did. Erick used the power of TechCrunch for evil.
With one tweet from the TechCrunch twitter account (with its nearly 1 million-person reach) he was able to put Rick Astley’s Never Gonna Give you Up at the top of the Hype Machine Twitter chart. Erick writes “The Hype Machine’s formula is flawed. No single person should be able to affect the rankings so easily“.
It’s arguable whether or not this is a dishonest manipulation of the charts. TechCrunch really does have a reach of 1 million people – and so by tweeting Rick Astley they are potentially exposing those millions to this song. However, in reality, people don’t read TechCrunch for music recommendations – TechCruch is just not a music tastemaker (sorry Erick). A tweet by TechCrunch counts much less than a tweet by Indie music guide Pitchfork.
Update – Note that the spammers are now starting to recognize the twitterverse as a place that they can target. If you have $27 you can get the twittertrafficmachine to get you 20K followers in a month:
Anthony should adjust how he scores a tweet to not only include the reach of the tweet but to also include the music reputation of the source. It is not as easy to determine the music reputation as the number of followers for a source, but it is much more important. Some indicators that a tweet has real influences are whether people actually click on the link and listen to the song and whether the poster actually listens to music, especially new music, before it gets popular.
I suspect Anthony will be tweaking his scoring algorithms soon to make the charts better reflect what real music listeners are listening to, not just what popular people are listening to.
Update: Anthony has responded in he comments.
Billboard, the venerable maintainer of the Billboard Hot 100 and a bevy of other music charts, is now making this data available via an API. The API “puts the entire rich history of the Billboard charts at your fingertips to sample and mix into your web pages and applications.”. The API is in public beta – but already it is supplying some really good information.
The first service that they’ve rolled out is the ‘Chart’ service, which lets you search and retrieve Billboard chart information.
For example, to find all appearances of The Beatles on any of the Billboard charts during the first week of June in 1964, you could make the call:
<?xml version='1.0' encoding='UTF-8'?> <searchResults firstPosition='1' totalReturned='6' totalRecords='6'> <chartItem id='8807769' rank='2' exrank='0'> <chart id='3070264'> <name>The Billboard Hot 100</name> <issueDate>1964-06-06</issueDate> <specId>379</specId> <specType>Singles</specType> </chart> <artist>The Beatles</artist> <writer /> <song>Love Me Do</song> <producer /> <catalogNo>9008</catalogNo> <promotion /> <distribution>Tollie</distribution> <peak>1</peak> <weeksOn>14</weeksOn> </chartItem> <chartItem id='8715479' rank='4' exrank='0'> <chart id='3068613'> <name>The Billboard 200</name> <issueDate>1964-06-06</issueDate> <specId>305</specId> <specType>Albums</specType> </chart> <artist>The Beatles</artist> <writer /> <song>The Beatles' Second Album</song> <producer /> <catalogNo>2080</catalogNo> <promotion /> <distribution>Capitol</distribution> <peak>1</peak> <weeksOn>55</weeksOn> </chartItem> <chartItem id='8715481' rank='6' exrank='0'> <chart id='3068613'> <name>The Billboard 200</name> <issueDate>1964-06-06</issueDate> <specId>305</specId> <specType>Albums</specType> </chart> <artist>The Beatles</artist> <writer /> <song>Meet The Beatles!</song> <producer /> <catalogNo>2047</catalogNo> <promotion /> <distribution>Capitol</distribution> <peak>1</peak> <weeksOn>71</weeksOn> </chartItem> <chartItem id='8807803' rank='36' exrank='0'> <chart id='3070264'> <name>The Billboard Hot 100</name> <issueDate>1964-06-06</issueDate> <specId>379</specId> <specType>Singles</specType> </chart> <artist>The Beatles</artist> <writer /> <song>Do You Want To Know A Secret</song> <producer /> <catalogNo>587</catalogNo> <promotion /> <distribution>Vee-Jay</distribution> <peak>2</peak> <weeksOn>11</weeksOn> </chartItem> <chartItem id='8715486' rank='11' exrank='0'> <chart id='3068613'> <name>The Billboard 200</name> <issueDate>1964-06-06</issueDate> <specId>305</specId> <specType>Albums</specType> </chart> <artist>The Beatles</artist> <writer /> <song>Introducing...The Beatles</song> <producer /> <catalogNo>1062</catalogNo> <promotion /> <distribution>Vee-Jay</distribution> <peak>2</peak> <weeksOn>49</weeksOn> </chartItem> <chartItem id='8807777' rank='10' exrank='0'> <chart id='3070264'> <name>The Billboard Hot 100</name> <issueDate>1964-06-06</issueDate> <specId>379</specId> <specType>Singles</specType> </chart> <artist>The Beatles</artist> <writer /> <song>P.S. I Love You</song> <producer /> <catalogNo>9008</catalogNo> <promotion /> <distribution>Tollie</distribution> <peak>10</peak> <weeksOn>8</weeksOn> </chartItem> </searchResults>
You can restrict searches to various charts (Hot Country, Pop 100, Top Latin, etc.) , and you can search by artist and/or song name over a range of dates. (Unfortunately, but not too surprisingly, the data for the current month is not available in the searches).
The terms-of-service seem pretty reasonable- you are allowed to make 1,500 API calls per day at up to 2 queries per second. Commercial use seems to be allowed (But I’m not a lawyer, so you should check for yourself). However, according to the terms, you are not allowed to store any of the Billboard data. The services are well documented, support JSON as well as XML output and query times are fast.
I can think of all sorts of uses for this data – to help create playlists for the 25 year high school reunion, tracking artist popularity over time, answering bar room music questions like “What was the highest charting instrumental-only single?” or “Did Ringo ever have a hit?”. It is perfect data for the Music Alchemists that are trying to build an automatic hit predictor.
The Billboard chart API is an excellent addition to the world of music web services. It goes straight into my Top Ten Music APIs chart – with a bullet.