Archive for category fun

Austin climbing the Most Musical City chart with a bullet!

Posted by Paul in data, fun on May 24, 2012

I’ve received quite a bit of feedback on my recent Most Musical City post, especially from folks from Austin that didn’t like Austin’s 14th place ranking. This reddit/austin comment thread was rather brutal, and this Austinist article Wait, What?! Austin Not Ranked In Top 10 Musical Cities List even closed with this appeal: any data analysts out there up for the challenge to get Austin closer to the top?

Well, John Rees, the Director of Community & Economic Development at Capital Area Council of Governments in Austin is just the data analyst that the Austinist was looking for. He re-ran the analysis but instead of using city populations he calculated the rankings based upon metropolitan statistical areas. In the May issue of Data Points Newsletter John reports on this analysis:

When data from The Echo Nest is adjusted to include metropolitan statistical area population data, the rankings of America’s most musical places changes significantly. Topping the list is Nashville, San Francisco and Los Angeles (which includes Beverly Hills). The Austin region jumps ten places from the original list to become America’s forth [sic] most musical region.

John goes on to point out some of the non-quantifiable aspects of the Austin music scene such as the diversity of music as well as the presence of events such as SXSW and Austin City Limits. John makes a strong argument that Austin is one of the country’s premier music destinations. Even the reaction of Austin’s residents to my post says a lot about Austin as a music city. People from Austin really care about music and don’t take it kindly when they are not at the top of the most musical city list. So congrats to Austin, not just for moving up the chart but also for demonstrating that Austin is the city that is most passionate about music

austin, cities, Music

2 Comments

What is the most musical city in the United States?

Posted by Paul in code, data, fun, The Echo Nest on May 20, 2012

There are many cities in the United States that are known for their music. Cities like Nashville, Detroit, Seattle and New Orleans have played a major part in the musical history and development of this country. But what is the most musical city? Which city has spawned the most musical artists? To answer this question I used the soon-to-be-released artist location data from The Echo Nest artist API. I gathered up the top 50,000 or so U.S. artists, found their city of origin and tallied the number of artists per city. From this tally I calculated the number of artists per 1,000 inhabitants in each city. The more artists per 1000 inhabitants, the more musical the city.

Using the artists per 1k inhabitants, we can easily find the top 25 most musical cities in the United States:

#	Artists per 1,000 inhabitants	Artists	Population	City
1	3.14	111	35355	Beverly Hills, CA
2	2.26	1651	732072	San Francisco, CA
3	1.68	894	530852	Nashville, TN
4	1.64	936	571281	Boston, MA
5	1.54	651	422908	Atlanta, GA
6	1.53	53	34703	Charlottesville, VA
7	1.48	817	552433	Washington, DC
8	1.39	513	367773	Minneapolis, MN
9	1.37	740	540513	Portland, OR
10	1.32	51	38601	Burlington, VT
11	1.24	4789	3877129	Los Angeles, CA
12	1.22	15	12314	Muscle Shoals, AL
13	1.20	683	569369	Seattle, WA
14	1.11	755	678368	Austin, TX
15	1.05	75	71253	Bloomington, IN
16	1.05	50	47529	Chapel Hill, NC
17	1.05	47	44916	Olympia, WA
18	1.00	13	12945	Princeton, NJ
19	0.95	182	190886	Richmond, VA
20	0.94	11	11678	Hendersonville, NC
21	0.87	12	13769	Malibu, CA
22	0.87	88	100975	Denton, TX
23	0.86	179	207970	Orlando, FL
24	0.86	86	100158	Berkeley, CA
25	0.85	114	133874	Orange, CA

I find the results to be pretty interesting. Beverly Hills, the tiny city at the heart of the entertainment world is #1. San Francisco is the most musical of all large cities, followed closely by Nashville. Among, the most musical of small cities is Muscle Shoals AL which, according to Wikipedia, is famous for its contributions to American popular music. Less musical than expected are New Orleans (rank 36), NYC (rank 37), Detroit (rank 52).

Among the least musical cities in the U.S. are my hometown (Manchester NH), with only one artist in the top 50,000 U.S. based artist for the 100K inhabitants. The least musical large city in the U.S. is Kansas City KS, with only 7 top-50k artists for their nearly half million inhabitants. Luckily Kansas City residents can drive a few miles to Kansas city Missouri (with its 194 musicians for its 442k inhabitants) when they get tired of their own seven artists.

You can see the full list of cities with population greater than 5,000 ordered by their musicality here: The Most Musical Cities in the United States. I’d love to do this for all the cities in the world, but I can’t find a good source of city population data for world cities. If you know of one let me know.

I’m rather exited about this upcoming release of artist location data in our API. It will open the doors for a whole bunch of interesting applications, such as road trip playlisters that play music by artists local to the city you are near, contextual playlisters that will favor artists from your home town, or music exploration apps that will let you explore music from a particular region of the world. I can’t wait to see what people build with this data. Stay tuned, I’ll post when the API is released.

19 Comments

Map of Music Styles

Posted by Paul in code, data, events, fun, tags, The Echo Nest, visualization on April 22, 2012

I spent this weekend at Rethink Music Hackers’ Weekend building a music hack called Map of Music Styles (aka MOMS). This hack presents a visualization of over 1000 music styles. You can pan and zoom through the music space just like you can with Google maps. When you see an interesting style of music you can click on it to hear some samples of music of that style.

It is fun to explore all the different neighborhoods of music styles. Here’s the Asian corner:

Here’s the Hip-Hop neighborhood:

And a mega-cluster of ambient/chill-out music:

To build the app, I collected the top 2,000 or so terms via The Echo Nest API. For each term I calculated the most similar terms based upon artist overlap (for instance, the term ‘metal’ and ‘heavy metal’ are often applied to the same artists and so can be considered similar, where as ‘metal’ and ‘new age’ are rarely applied to the same artist and are, therefore, not similar). To layout the graph I used Gephi (Its like Photoshop for graphs) and exported the graph to SVG. After that it was just a bit of Javascript, HTML, and CSS to create the web page that will let you pan and zoom. When you click on a term, I fetch audio that matches the style via the Echo Nest and 7Digital APIs.

There are a few non-styles that snuck through – the occasional band name, or mood, but they don’t hurt anything so I let them hang out with the real styles. The app works best in Chrome. There’s a bug in the Firefox version that I need to work out.

Give it a try and let me know how you like it: Map of Music Styles

echonest, gephi, rethinkmusic

18 Comments

Is music getting more profane?

Posted by Paul in data, fun, Music on April 1, 2012

This post has profanity in it. If you don’t like profanity, skip this post and instead just look at this picture of a cat. Otherwise, scroll on down to read about the rise and fall of profanity in music.

Now, on to the profanity …

It seems that every year the amount of profanity in music has increased. Today it seems that every other pop song drops the f-bomb, from P!nk’s ‘Fucking Perfect’ to Cee Lo’s ‘Fuck You’. I wondered if this apparent trend was real so I took a look at when certain obscene words started to show up in song titles to see if there are any obvious trends. Here’s the data:

The word ‘fuck’ doesn’t appear in a song title until 1977 when the band ‘The Way’ released ‘Fucking Police’ . This monumental song in music history seems to be lost to the Internet age. The only evidence that this song ever existed is this MusicBrainz entry. The second song with ‘fuck’ in the title, ‘To Fuck The Boss’ by Blowfly appeared in 1978. This sophmore effort is preserved on Youtube:

[youtube http://www.youtube.com/watch?v=J3wGresI0S4]

The peak in usage of the word ‘fuck’ in song titles occurs in 2006 with 650 songs. Since then, peak usage has dropped off substantially, 2011 saw about the same ‘fuck’ frequency as 1999.

Usage of the word ‘shit’ has a similar profile:

The first usage of the word ‘shit’ in a song title was in 1966 in the song ‘I feel like homemade shit’ by The Fugs, which appeared on The Fugs first album (originally titled The Village Fugs Sing Ballads of Contemporary Protest, Point of Views, and General Dissatisfaction). Again the peak year of use is 2006 with 322 ‘shit’ songs that year.

Looking at these graphs, one would get the impression that use of profanity has grown substantially since the 70s and reached its peak a few years ago. However, there’s more to the data than that. Let’s look at a similar plot for a non-profane word:

This plot shows a very similar usage profile for the word ‘cat’, with substantial growth in use from the 70s until 2006 when it starts to taper off. (Yes, ‘cat’ was found in many songs before 1976, but I am not showing those in the plot). Why do ‘fuck’ and ‘cat’ have such similar profiles? It is not because their usage frequency has increased, it is because the total number of songs released has been increasing year-over-year until 2006, after which the number of new releases per year has been dropping off. We see more ‘fuck’s and ‘cat’s in 2006 because there were more songs released in 2006 than any other year. For a more accurate view we need to look at the relative usage changes. This plot shows the usage of the word ‘fuck’ relative to the usage of other words in song titles. Even when we look at the use of the word ‘fuck’ relative to other words there is a clear increasing trend.

Is music getting more profane? The answer is yes. The data show that the likelihood of a song with the word ‘fuck’ in the title has more than doubled since the 80s. And it doesn’t look like this trend has reached its peak yet. I think we shall continue to see a rise in use of language that gets a rise out of moms like Tipper Gore.

3 Comments

Waltzify – turn any 4/4 song into a waltz with Echo Nest remix

Posted by Paul in code, fun, The Echo Nest on March 23, 2012

Tristan Jehan, one of the founders here at the Echo Nest, has created a Python script that will take a 4/4 song and turn it into a waltz. The script uses Echo Nest remix, a Python library that lets you algorithmically manipulate music. Here’s an example of the output of the script when applied to the song ‘Fame’:

Turning a 4/4 song into a 3/4 song while still keeping the song musical is no easy feat. But Tristan’s algorithm does a pretty good job. Here’s what he does:

Start with a 4/4 measure
Cut the 4/4 measure into 2 bars with 2 beats in each bar
Stretch the first beat of each bar by 100%
Adjust the tempo to a typical waltz tempo

Here’s a graphic that shows the progression:

Here are some more examples:

Tristan has made the waltzifier code available on github. If you want to make your own waltzes, get yourself an Echo Nest API key and grab Echo Nest remix and start enjoying the power of 3.

3 Comments

Boil the Frog – the unreleased Spotify Version

Posted by Paul in code, data, events, fun, The Echo Nest on February 26, 2012

Update – You are probably looking for this web-based version of Boil The Frog and the blog post about it.

The rest of this article is about the unreleased Spotify Version of Boil the Frog.

I’m at Music Apps Hack Weekend doing my favorite thing: hacking on music. I’ve just finished my hack called Boil the Frog. Boil the Frog is a Spotify App that will create playlists that gradually take you from one music style to another. It is like the proverbial story of the frog in the pot of water. If you heat the water gradually, the frog won’t notice and will happily sit in the pot until it becomes frog stew. With Boil the Frog you can do the same thing musically. Create a playlist that gradually takes your pre-teen from Miley Cyrus to Miles Davis, or perhaps more perversely the Kenny G fan to Cannibal Corpse.

To build the app I built an artist similarity graph of 100,000 of the most popular artists. I use The Echo Nest artist similarity to connect each artist to its four nearest neighbors. To find the path between any two artists I use a bidirectional Dijkstra shortest path algorithm. Most paths can be computed in less than 100ms.

The Spotify Apps API is the perfect hacking platform. You can build a Spotify app that has full access to the vast Spotify music catalog and artwork, along with access to the listener’s catalog. Since the Spotify Apps run in an embedded browser all of your web app programming skills apply. You can use jQuery, make calls to JSON APIs, use HTML 5 canvas. It is all there. Spotify has done a really good job putting together this platform. The only downside is that, unlike the web, it is hard to actually release Spotify apps, but the Spotify team is working to make this easier. I’d love to release Boil the Frog because it is really fun to make playlists that bring you from one music style to another. It is interesting to see what musical neighborhoods you wander through on your way. For instance, I made a Kenny G to Cannibal Corpse playlist. To get there, the playlist brought me from easy listening, to movie soundtracks and then through video game soundtracks to get to the heavy metal world. Cool stuff. If you want to see a playlist between two artists let me know in the comments and I’ll create and share the playlist with you.

I made a video of Boil the Frog in action. Check it out:

[youtube http://youtu.be/Nj6JAxm9aPE]

Update: I’ve just pushed the client code out to github: https://github.com/plamere/boilthefrog

echonest, hacking, mahw, spotify

16 Comments

Paul vs. Billboard

Posted by Paul in code, data, fun, The Echo Nest on February 12, 2012

Another weekend, another Music Hack Day. This weekend I’m at Tokbox headquarters in San Francisco at the 3rd annual Music Hack Day San Francisco, where 200 music hackers are building the future of music.

For my hack, I thought I would try to predict who would win the Grammy awards (the annual music awards presented by The Recording Academy) which is being held this evening. To do this, I used the Echo Nest APIs to gather of lots of news and blog posts for each nominated artist. I then peered into the articles looking for mentions of the Grammy nominated items. I tallied up the mentions and combined this with the overall artist hotttnesss to give me a ranked order of each nominated item, which I could then use to create my prediction.

Since Billboard has also made some Grammy predictions, I thought it’d be interesting to do a post-facto comparison on how well each of us predicts the winners – thus the hack title ‘Paul vs. Billboard’.

The hack is online here: Paul vs. Billboard

Be sure to check out all of the other music hacks being created this weekend:

List of Music Hackday San Francisco 2012 hacks

billboard, grammys, musichackday

Artists that called it a day in 2011

Posted by Paul in code, data, fun, The Echo Nest on December 18, 2011

It is that time of year when music critics make their year-in-review lists: best albums, worst albums, best new artists and so on. To help critics with their year-end review, I’ve put together a list of the top artists that stopped performing in 2011 – due to retirement, breaking up or due to death.

I made this list using by calling Echo Nest artist search call, limiting the results to artists with an ending year of 2011. Here’s the salient bit of python:

           results = artist.search(artist_end_year_after=2010, artist_end_year_before=2012,
                             buckets=['urls', 'years_active'], sort='hotttnesss-desc')

You can see the list of the 3,300 or so artists that stopped performing in 2011 here: Artists that called it a day in 2011. Thanks to Matt Santiago, master of data quality at The Echo Nest, for coming up with the idea for the list.

In the same vein, I created a list of the top 100 artists (based upon Echo Nest hotttnesss) that became active or released their first recording in 2011.

Check this list out at: Top New Artists for 2011

2011, artist search, music critics, year in review, years active

3 Comments

Search for music by drawing a picture of it

Posted by Paul in code, data, fun, Music, The Echo Nest on September 25, 2011

I’ve spent the weekend hacking on a project at Music Hack Day Montreal. For my hack I created an application with the catchy title “Search for music by drawing a picture of it”. The hack lets you draw the loudness profile for a song and the app will search through the Million Song Data Set to find the closest match. You can then listen to the song in Spotify (if the song is in the Spotify collection).

Coding a project in 24 hours is all about compromise. I had some ideas that I wanted to explore to make the matching better (dynamic time warping) and the lookup faster (LSH). But since I actually wanted to finish my hack I’ve saved those improvements for another day. The simple matching approach (Euclidean distance between normalized vectors) works surprisingly well. The linear search through a million loudness vectors takes about 20 seconds, too long for a web app, this can be made palatable with a little Ajax .

The hack day has been great fun, kudos to the Montreal team for putting it all together.

montreal, msd, musichackday

Looking for the Slow Build

Posted by Paul in code, data, fun, Music on September 18, 2011

This is the second in a series of posts exploring the Million Song Dataset.

Every few months you’ll see a query like this on Reddit – someone is looking for songs that slowly build in intensity. It’s an interesting music query since it is primarily focused on what the music sounds like. Since we’ve analyzed the audio of millions and millions of tracks here at The Echo Nest we should be able to automate this type of query. One would expect that Slow Build songs will have a steady increase in volume over the course of a song, so lets look at the loudness data for a few Slow Build songs to confirm this intuition. First, here’s the canonical slow builder: Stairway to Heaven:

Loudness plot of Stairway to Heaven The green line is the raw loudness data, the blue line is a smoothed version of the data. Clearly we see a rise in the volume over the course of the song. Let’s look at another classic Slow Build – The Hall Of the Mountain King – again our intuition is confirmed:

Looking at a non-Slow Build song like Katy Perry’s California Gurls we see that the loudness curve is quite flat by comparison:

Loudness Plot for California Gurls by Katy Perry

Of course there are other aspects beyond loudness that a musician may use to build a song to a climax – tempo, timbre and harmony are all useful, but to keep things simple I’m going to focus only on loudness.

Looking at these plots it is easy to see which songs have a Slow Build. To algorithmically identify songs that have a slow build, we can use a technique similar to the one I described in The Stairway Detector. It is a simple algorithm that compares the average loudness of the first half of the song to the average loudness of the second half of the song. Songs with the biggest increase in average loudness rank the highest. For example, take a look at a loudness plot for Stairway to Heaven. You can see that there is a distinct rise in scores from the first half to the second half of the song (the horizontal dashed lines show the average loudness for the first and second half of the song). Calculating the ramp factor we see that Stairway to Heaven scores an 11.36 meaning that there is an increase in average loudness of 11.36 decibels between the first and the second half of the song.

This algorithm has some flaws – for instance it will give very high scores to ‘hidden track’ songs. Artists will sometimes ‘hide’ a track at the end of a CD by padding the beginning of the track with a few minutes of silence. For example, this track by ‘Fudge Tunnel’ has about five minutes of silence before the band comes in.

Clearly this song isn’t a Slow Build, our simple algorithm is fooled. To fix this we need to introduce a measure of how straight the ramp is. One way to measure the straightness of a line is to calculate the Pearson correlation for the loudness data as a function of time. XY Data with correlation that approaches one (or negative one) is by definition, linear. This nifty wikipedia visualization of the correlation of different datasets shows the correlation for various datasets:

We can combine the correlation with our ramp factors to generate an overall score that takes into account the ramp of the song as well as the straightness of the ramp. The overall score serves as our Slow Build detector. Songs with a high score are Slow Build songs. I suspect that there are better algorithms for this so if you are a math-oriented reader who is cringing at my naivete please set me and my algorithm straight.

Armed with our Slow Build Detector, I built a little web app that lets you explore for Slow Build songs. The app – Looking For The Slow Build – looks like this:

The application lets you type in the name of your favorite song and will give you a plot of the loudness over the course of the song, and calculates the overall Slow Build score along with the ramp and correlation. If you find a song with an exceptionally high Slow Build score it will be added to the gallery. I challenge you to get at least one song in the gallery.

You may find that some songs that you think should get a high Slow Build score don’t score as high as you would expect. For instance, take the song Hoppipolla by Sigur Ros. It seems to have a good build, but it scores low:

Loudness plot for Hoppipolla by Sigur Ros

It has an early build but after a minute it has reached it’s zenith. The ending is symmetrical with the beginning with a minute of fade. This explains the low score.

Another song that builds but has a low score is Weezer’s The Angel and the One.

This song has a 4 minute power ballad build – but fails to qualify a a slow build because the last 2 minutes of the song are nearly silent.

Finding Slow Build songs in the Million Song Dataset

Now that we have an algorithm that finds Slow Build songs, lets apply it to the Million Song Dataset. I can create a simple MapReduce job in Python that will go through all of the million tracks and calculate the Slow Build score for each of them to help us find the songs with the biggest Slow Build. I’m using the same framework that I described in the post “How to Process a Million Songs in 20 minutes“. I use the S3 hosted version of the Million Song Dataset and process it via Amazon’s Elastic MapReduce using mrjob – a Python MapReduce library. Here’s the mapper that does almost all of the work, the full code is on github in cramp.py:

    def mapper(self, _, line):
        """ The mapper loads a track and yields its ramp factor """
        t = track.load_track(line)
        if t and t['duration'] > 60 and len(t['segments']) > 20:
            segments = t['segments']
            half_track = t['duration'] / 2
            first_half = 0
            second_half = 0
            first_count = 0
            second_count = 0

            xdata = []
            ydata = []
            for i in xrange(len(segments)):
                seg = segments[i]
                seg_loudness = seg['loudness_max'] * seg['duration']

                if seg['start'] + seg['duration'] <= half_track:
                    seg_loudness = seg['loudness_max'] * seg['duration']
                    first_half += seg_loudness
                    first_count += 1
                elif seg['start'] < half_track and seg['start'] + seg['duration'] > half_track:
                    # this is the nasty segment that spans the song midpoint.
                    # apportion the loudness appropriately
                    first_seg_loudness = seg['loudness_max'] * (half_track - seg['start'])
                    first_half += first_seg_loudness
                    first_count += 1

                    second_seg_loudness = seg['loudness_max'] * (seg['duration'] - (half_track - seg['start']))
                    second_half += second_seg_loudness
                    second_count += 1
                else:
                    seg_loudness = seg['loudness_max'] * seg['duration']
                    second_half += seg_loudness
                    second_count += 1

                xdata.append( seg['start'] )
                ydata.append( seg['loudness_max'] )

            correlation = pearsonr(xdata, ydata)
            ramp_factor = second_half / half_track - first_half / half_track
            if YIELD_ALL or ramp_factor > 10 and correlation > .5:
                yield (t['artist_name'], t['title'], t['track_id'], correlation), ramp_factor

This code takes less than a half hour to run on 50 small EC2 instances and finds a bucketload of Slow Build songs. I’ve created a page of plots of the top 500 or so Slow Build songs found by this job. There are all sorts of hidden gems in there. Go check it out:

Looking for the Slow Build in the Million Song Dataset

The page has 500 plots all linked to Spotify so you can listen to any song that strikes your fancy. Here are some my favorite discoveries:

Respighi’s The Pines of the Appian Way

I remember playing this in the orchestra back in high school. It really is sublime. Click the plot to listen in Spotify.

Maria Friedman’s Play The Song Again

So very theatrical

Mandy Patinkin’s Rock-A-Bye Your Baby With A Dixie Melody

Another song that seems to be right off of Broadway – it has an awesome slow build.

There are thousands and thousands of slow build songs. I’ve created a table with all the songs that have a score of above 10 on the Slow Build scale (recall that Stairway to Heaven scores a 9, so this is a table of about 6K songs that are bigger Slow Build songs than Stairway).

Conclusion

This just about wraps up the most complex blog post I’ve ever made with a web app, a map-reduce program, and a jillion behind the scenes scripts to make tables and nice looking plots – but in the end, I found new music that I like so it was worth it all. Here’s a summary of all the resources associated with this post:

The Million Song Dataset – deep data about a million songs
The Stairway Index – my first look at this stuff about 2 years ago
How to process a million songs in 20 minutes – a blog post about how to process the MSD with mrjob and Elastic Map Reduce
Looking for the Slow Build – a simple web app that calculates the Slow Build score and loudness plot for just about any song
cramp.py – the MapReduce code for calculating Slow Build scores for the MSD
Looking for the Slow Build in the Million Song Dataset – 500 loudness plots of the top Slow Builders
Top Slow Build songs in the Million Song Dataset – the top 6K songs with a Slow Build score of 10 and above
A Spotify collaborative playlist with a bunch of Slow Build songs in it. Feel free to add more.

Thanks to Spotify for making it so easy to find music with their meta-data API and make links that play music. Apologies to all of the artists with accented characters in their names, I was encoding-challenged this weekend, so my UTF is all WTF.

echonest, MillionSong, msd, Slow Build

12 Comments

Music Machinery

Archive for category fun

Austin climbing the Most Musical City chart with a bullet!

What is the most musical city in the United States?

Map of Music Styles

Is music getting more profane?

Waltzify – turn any 4/4 song into a waltz with Echo Nest remix

Boil the Frog – the unreleased Spotify Version

Paul vs. Billboard

Artists that called it a day in 2011

Search for music by drawing a picture of it

Looking for the Slow Build

Looking for the Slow Build in the Million Song Dataset

Music Machinery

Top Posts

Related Stuff

Categories