How do you spell ‘Britney Spears’?

I’ve been under the weather for the last couple of weeks, which has prevented me from doing most things, including blogging. Luckily, I had a blog post sitting in my drafts folder almost ready to go.  I spent a bit of time today finishing it up, and so here it is. A look at the fascinating world of spelling correction for artist names.

In today’s digital music world, you will often look for music by typing an artist name into a search box of your favorite music app.   However this becomes a problem if you don’t  know how to spell the name of the artist you are looking for. This is probably not much of a problem if you are  looking for U2, but it most definitely is a problem if you are looking for Röyksopp, Jamiroquai or  Britney Spears. To help solve this problem, we can try to identify common misspellings for artists and use these misspellings to help steer you to the artists that you are looking for.

A spelling corrector in 21 lines of code
A good place for us to start  is a post by  Peter Norvig (Director of Research at Google) called  ‘How to write a spelling corrector‘ which presents a fully operational spelling corrector in 21 lines of Python.  (It is a phenomenal bit of code, worth the time studying it).  At the core of Peter’s  algorithm is the concept of the edit distance  which is a way to represent the similarity of two strings by calculating the number of operations (inserts, deletes, replacements and transpositions) needed to transform one string into the other.  Peter cites literature that suggests that 80 to 95% of spelling errors are within an edit distance of 1 (meaning that  most misspellings are just one insert, delete, replacement or transposition away from the correct word).     Not being satisfied with that accuracy, Peter’s algorithm considers all words that are within an edit distance of 2 as candidates for his spelling corrector.  For Peter’s small test case (he wrote his system on a plane so he didn’t have lots of data nearby), his corrector covered 98.9% of his test cases.

Spell checking Britney
A few years ago, the smart folks at Google posted a list of Britney Spears spelling corrections that shows nearly 600 variants on Ms. Spears name collected in three months of Google searches.   Perusing the list, you’ll find all sorts of interesting variations such as ‘birtheny spears’ , ‘brinsley spears’ and ‘britain spears’.  I suspect that some these queries (like ‘Brandi Spears’) may actually not be for  the pop artist. One curiosity in the list is that although there are 600 variations on the spelling of ‘Britney’ there is exactly one way that ‘spears’ is spelled.  There’s no ‘speers’ or ‘spheres’, or ‘britany’s beers’ on this list.

One thing I did notice about Google’s list of Britneys is that there are many variations that seem to be further away from the correct spelling than an edit distance of two at the core of Peter’s algorithm.  This means that if you give these variants to Peter’s spelling corrector, it won’t find the proper spelling. Being an empiricist I tried it and found that of the 593  variants of ‘Britney Spears’,  200 were not within an edit distance of two of the proper spelling and would not be correctable.  This is not too surprising.  Names are traditionally hard to spell, there are many alternative spellings for the name ‘Britney’ that are real names, and many people searching for music artists for the first time may have only heard the name pronounced and have never seen it in its written form.

Making it better with an artist-oriented spell checker
A 33% miss rate for a popular artist’s name seems a bit high, so  I thought I’d see if I could improve on  this.  I have one big advantage that Peter didn’t. I work for a music data company so I can be pretty confident that all the search queries that I see are going to be related to music. Restricting the possible vocabulary to just artist names makes things a whole lot easier. The algorithm couldn’t be simpler. Collect the names of the top 100K most popular artists. For each artist name query,  find the artist name with the smallest edit distance to the query and return that name as the best candidate match.  This algorithm will let us find the closest matching artist even if it is has an edit distance of more than 2 as we see in Peter’s algorithm.  When I run this against the 593 Britney Spears misspellings, I only get one mismatch – ‘brandi spears’ is closer to the artist ‘burning spear’ than it is to ‘Britney Spears’.  Considering the naive implementation, the algorithm is fairly fast (40 ms per query on my 2.5 year old laptop, in python).

Looking at spelling variations
With this artist-oriented spelling checker in hand,  I decided to take a look at some real artist queries to see what interesting things I could find buried within.   I gathered some artist name search queries from the Echo Nest API logs and looked for some interesting patterns (since I’m doing this at home over the weekend, I only looked at the most recent logs which consists of only about 2 million artist name queries).

Artists with most spelling variations
Not surprisingly, very popular artists are the most frequently misspelled.  It seems that just about every permutation has been made in an attempt to spell these artists.

  • Michael Jackson – Variations: michael jackson,  micheal jackson,  michel jackson,  mickael jackson,  mickal jackson,  michael jacson,  mihceal jackson,  mickeljackson,  michel jakson,  micheal jaskcon,  michal jackson,  michael jackson by pbtone,  mical jachson,  micahle jackson,  machael jackson,  muickael jackson,  mikael jackson,  miechle jackson,  mickel jackson,  mickeal jackson,  michkeal jackson,  michele jakson,  micheal jaskson,  micheal jasckson,  micheal jakson,  micheal jackston,  micheal jackson just beat,  micheal jackson,  michal jakson,  michaeljackson,  michael joseph jackson,  michael jayston,  michael jakson,  michael jackson mania!,  michael jackson and friends,  michael jackaon,  micael jackson,  machel jackson,  jichael mackson
  • Justin BieberVariations: justin bieber,  justin beiber,  i just got bieber’ed by,  justin biber,  justin bieber baby,  justin beber,  justin bebbier,  justin beaber,  justien beiber,  sjustin beiber,  justinbieber,  justin_bieber,  justin. bieber,  justin bierber,  justin bieber<3 4 ever<3,  justin bieber x mstrkrft,  justin bieber x,  justin bieber and selens gomaz,  justin bieber and rascal flats,  justin bibar,  justin bever,  justin beiber baby,  justin beeber,  justin bebber,  justin bebar,  justien berbier,  justen bever,  justebibar,  jsustin bieber,  jastin bieber,  jastin beiber,  jasten biber,  jasten beber songs,  gestin bieber,  eiine mainie justin bieber,  baby justin bieber,
  • Red Hot Chili PeppersVariations: red hot chilli peppers,  the red hot chili peppers,  red hot chilli pipers,  red hot chilli pepers,  red hot chili,  red hot chilly peppers,  red hot chili pepers,  hot red chili pepers,  red hot chilli peppears,  redhotchillipeppers,  redhotchilipeppers,  redhotchilipepers,  redhot chili peppers,  redhot chili pepers,  red not chili peppers,  red hot chily papers,  red hot chilli peppers greatest hits,  red hot chilli pepper,  red hot chilli peepers,  red hot chilli pappers,  red hot chili pepper,  red hot chile peppers
  • Mumford and SonsVariations: mumford and sons,  mumford and sons cave,  mumford and son,  munford and sons,  mummford and sons,  mumford son,  momford and sons,  modfod and sons,  munfordandsons,  munford and son,  mumfrund and sons,  mumfors and sons,  mumford sons,  mumford ans sons,  mumford and sonns,  mumford and songs,  mumford and sona,  mumford and,  mumford &sons,  mumfird and sons,  mumfadeleord and sons
  • Katy Perry – Even an artist with a seemingly very simple name like Katy Perry has numerous variations:  katy perry,  katie perry,  kate perry,    kathy perry,  katy perry ft.kanye west,  katty perry,  katy perry i kissed a girl,  peacock katy perry,  katyperry,  katey parey,   kety perry,  kety peliy,  katy pwrry,  katy perry-firework,  katy perry x,  katy perry,  katy perris,  katy parry,  kati perry,  kathy pery,  katey perry,  katey perey,  katey peliy,  kata perry,  kaity perry

Some other most frequently misspelled artists:

  • Britney Spears
  • Linkin Park
  • Arctic Monkeys
  • Katy Perry
  • Guns N’ Roses
  • Nicki Minaj
Which artists are the easiest to spell?
Using the same techniques we can look through our search logs and find the popular artists that have the fewest misspelled queries. These are the easiest to spell artists. They include:
  • Muse
  • Weezer
  • U2
  • Oasis
  • Moby
  • Flyleaf
  • Seether
Most confused artists:
Artists are most easily confused with another include:
  • byran adams – ryan adams
  • Underworld – Uverworld
Wrapping up
Spelling correction for artist names is perhaps the least sexiest job in the music industry, nevertheless it is an important part of helping people connect with the music they are looking for.   There is a large body of research around context-sensitive spelling correction that can be used to help solve this problem, but even very simple techniques like those described here can go along way to helping you figure out what someone really wants when they search for ‘Jastan Beebar’.


  1. #1 by Param Arunachalam on July 28, 2011 - 11:25 am

    Fascinating! Thanks for sharing. For Indian music, there is an additional twist. There are many languages in India but because English is most widely understood amongst internet users, almost all Indian music services transliterate names, albums, songs to English. Therefore, there isn’t really a “correct” spelling of the transliterated English word. The algorithm you describe should still work since it returns the best match.

