Talk Radio – control Rdio with the new Web Speech API

Control your radio with your mouth

My weekend hack at the Tufts Hackathon was to build a music player that you can control with speech. The hack uses the new Web Speech API that just started shipping this week with Chrome 25. It seemed like it would be fun to give it a spin.  I created a  playlisting app that you can control with speech. It is called Talk Radio
Screenshot_2_24_13_10_07_AM
This app lets you control your music player with your words. Try saying something like:

  • Play music by Carly Rae Jepsen
  • Play music like Weezer
  • Play some brutal death metal
  • Play some christmas music
  • Play slow music by Beyoncé
  • Play fast music by Beyoncé
  • Play chill music in the style of smooth jazz
  • Play some screamo

Pro tip – the artist or genre should always be at the end of your utterance.

The hack is an exploration of how well an off-the-shelf speech large vocabulary speech recognizer would work in the music domain. Music has lots of hard names like deadmau5, p!nk, !!! and many domain-specific terms like ‘screamo’, ‘hip hop’, ‘shoegaze’. I am actually quite surprised at how well this works. The Google speech recognizer does a good job at understanding most of the neologism like ‘screamo’ and ‘shoegaze’, and does an excellent job at recognizing popular artist names like Jay-Z and Beyonce. For unusual artist names, The Echo Nest artist search does a really good job of finding what you meant. So when the speech recognizer returns “play music by chick chick chick”, The Echo Nest artist search can turn the artist search for “chick chick chick” into “!!!” with no problems. Similarly the speech recognizer will return “dead mouse” which The Echo Nest will resolve to ‘deadmau5’.

We can also field more general music queries. If a style query returns no results, it is re-submitted  as a general artist-description query. This lets you find more esoteric music “big hair bands”.

Issues

You have to grant the app permission to access the microphone for every utterance. This can be alleviated in the near future after a few API issues are sorted out. Until then, the app is all Cancel or Allow. (And yes, it is incredibly annoying). This is all sorted now.

This hack was built at the Tufts Hackathon 2013.  For me, it was a half-a-hackday with lots of time spent supporting The Echo Nest APIs to folks who had never used it before and traveling in the snow.  Still, it was fun to use the nifty new Web Speech API that just shipped this week in Chrome Version 25.

%d bloggers like this: