Finding artist names in text

Let’s say you have a block of text – perhaps a tweet or a web page from a music review site. If you want to find out if the text mentions a particular artist such as Weezer, it is a pretty straightforward task: Just search through the text for the artist name and all the variants and aliases for that artist. It is pretty easy.

What is harder is trying to figure out if any artists are mentioned in a block of text, and if so, which ones. Since there are millions of artists, each with their own set of aliases and variants, the simple search that we use to find ‘Weezer’ in a tweet doesn’t work so well. The fact that many artist names are also common words adds to the difficulty.

Luckily I work with a bunch of really smart folks at The Echo Nest who’ve already had to solve this problem in order to make The Echo Nest work. Over on the Echo Nest blog, there’s a nifty description of the problem of artist name identification and extraction and an announcement of the release of a new (and very much beta) API called artist/extract that will expose some of this functionality to application developers that use our APIs.

This morning I spent a few minutes and created a little web app that lets you play with the artist/extract API. Here’s a screenshot:

In this example I’ve typed in the text:

I like Deerhoof, and Emerson, Lake and Palmer. I don’t like Coldplay, or Justin Bieber. GNR is OK. Go try it yourself!

You can see that it found Deerhoof and Coldplay, (easy enough), and a spelling variant of Emerson, Lake & Palmer. It also recognized GNR as two bands – GNR (a Portuguese rock band), and as a nickname for Guns N’ Roses. Also notice that it didn’t get confused by the mention of ‘ OK. Go’ that is embedded in there. The extractor is not always perfect – it tries hard to avoid confusing artists with regular English words (since just about every English word is a band name), so it will rely on letter case and other hints to try to separate real artist mentions from accidental ones.

The artist extractor is very much a beta api so it may be a bit unsteady on its feet and may sometimes not work as you’d expect it to. Nevertheless, it is a nifty bit of music data infrastructure that will help us understand better who is talking about what artists.

Read the API docs for Artist/Extract – or try out the little web demo.

api, artists, echo nest, entity extraction

This entry was posted on June 16, 2011, 1:55 pm and is filed under Music. You can follow any responses to this entry through RSS 2.0. Both comments and pings are currently closed.

#1 by Peter Watts on June 16, 2011 - 2:09 pm

If only this existed a year ago when I was building a service that scraped concert listings from official venue websites, to build an automatic gig guide! But well done, I’m keen to play with this.
#2 by Sleeper Industries on June 17, 2011 - 11:50 am

Very cool. Are you planning on expanding this to match other things, like album names?
#3 by brian on June 19, 2011 - 9:16 am

hey abe, yes, that is on the roadmap. Song names / song-artist pairs first probably.
#4 by Eugenio Tacchini on June 21, 2011 - 6:35 pm

Cool, I think in future we will slowly move toward distributed (collecting implicit and explicit feedback from different sources) recommendation systems and this kind of tools together with some pragmatic use of semantic web technologies can really help.

Music Machinery