Introducing Project Rosetta Stone

Here at The Echo Nest we want to make the world easier for music app developers. We want to solve as many of the problems that developers face when writing music apps so that the developers can focus on building cool stuff instead of worrying about the basic plumbing . One of the problems faced by music application developers is the issue of ID translation. You may have a collection of music that is in one ID space (Musicbrainz for instance) but you want to use a music service (such as the Echo Nest’s Artist Similarity API) that uses a completely different ID space. Before you can use the service you have to translate your Musicbrainz IDs into Echo Nest IDs, make the similarity call and then, since the artist similarity call returns Echo Nest IDs, you have to then map the IDs back into the Musicbrainz space. The mapping from one id space to another takes time (perhaps even requiring another API method call to ‘search_artists’) and is a potential source of error — mapping artist names can be tricky – for example there are artists like Duran Duran Duran, Various Artists (the electronic musician), DJ Donna Summer, and Nirvana (the 60’s UK band) that will trip up even sophisticated name resolvers.

We hope to eliminate some of the trouble with mapping IDs with Project Rosetta Stone. Project Rosetta Stone is an update to the Echo Nest APIs to support non-Echo-Nest identifiers. The goal for Project Rosetta Stone is to allow a developer to use any music id from any music API with the Echo Nest web services. For instance, if you have a Musicbrainz ID for weezer, you can call any of the Echo Nest artist methods with the Musicbrainz ID and get results. Additionally, methods that return IDs can be told to return them in different ID spaces. So, for example, you can call artist.get_similar and specify that you want the similar artist results to include Musicbrainz artist IDs.

Dealing with the many different music ID formats One of the issues we have to deal with when trying to support many ID spaces is that the IDs come in many shapes and sizes. Some IDs like Echo Nest and Musicbrainz are self-identifying URLs, (self-identifying means that you can tell what the ID space is and the type of the item being identified (whether it is an artist track, release, playlist etc.)) and some IDs (like Spotify) use self-identifying URNs. However, many ID spaces are non-self identifying – for instance a Napster Artist ID is just a simple integer. Note also that many of the ID spaces have multiple renderings of IDs. Echo Nest has short form IDs (AR7BGWD1187FB59CCB and TR123412876434), Spotify has URL-form IDs (http://open.spotify.com/artist/6S58b0fr8TkWrEHOH4tRVu) and Musicbrainz IDs are often represented with just the UUID fragment (bd0303a-f026-416f-a2d2-1d6ad65ffd68) – and note that the use of Spotify and Napster in these examples are just to demonstrate the wide range of ID format.

We want to make the all of the ID types be self-identifying. IDs that are already self-identifying can be used without change. However, non-self-identifying ID types need to be transformed into a URN-style syntax of the form: vendor:type:vendor-specific-id. So for example, and a Napster track ID would be of the form: ‘napster:track:12345678’

What do we support now? In this first release of Rosetta Stone we are supporting URN-style Musicbrainz ids (probably one of the most requested enhancements to the Echo Nest APIs has been to include support for Musicbrainz). This means that any Echo Nest API method that accepts or returns an Echo Nest ID can also take a Musicbrainz ID. For example to get recent audio found on the web for Weezer, you could make the call with the URN form of the musicbrainz ID for weezer:

http://developer.echonest.com/api/get_audio
         ?api_key=5ZAOMB3BUR8QUN4PE
         &id=musicbrainz:artist:6fe07aa5-fec0-4eca-a456-f29bff451b04
         &rows=2&version=3 - (try it)

For a call such as artist.get_similar, if we are using Musicbrainz IDs for input, it is likely that you’ll want your results in the form of Musicbrainz ids. To do this, just add the bucket=id:musicbrainz parameter to indicate that you want Musicbrainz IDs included in the results:

http://developer.echonest.com/api/get_similar
               ?api_key=5ZAOMB3BUR8QUN4PE
               &id=musicbrainz:artist:6fe07aa5-fec0-4eca-a456-f29bff451b04
               &rows=10&version=3
               &bucket=id:musicbrainz  (try it)

<similar>
    <artist>
          <name>Death Cab for Cutie</name>
          <id>music://id.echonest.com/~/AR/ARSPUJF1187B9A14B8</id>
          <id type="musicbrainz">musicbrainz:artist:0039c7ae-e1a7-4a7d-9b49-0cbc716821a6</id>
          <rank>1</rank>
    </artist>

<!– more omitted –>

</similar>

Limiting results to a particular ID space – sometimes you are working within a particular ID space and you only want to include items that are in that space. To support this, Rosetta Stone adds an idlimit parameter to some of the calls. If this is set to ‘Y’ then results are constrained to be within the given ID space. This means that if you want to guarantee that only Musicbrainz artists are returned from the get_top_hottt_artists call you can do so like this:

http://developer.echonest.com/api/get_top_hottt_artists
       ?api_key=5ZAOMB3BUR8QUN4PE
       &rows=20
       &version=3
       &bucket=id:musicbrainz
       &idlimit=Y

What’s Next? In this initial release of Rosetta Stone we’ve built the infrastructure for fast ID mapping. We are currently supporting mapping between Echo Nest Artist IDs and Musicbrainz IDs. We will be adding support for mapping at the track level soon – and keep an eye out for the addition of commercial ID spaces that will allow easy mapping being Echo Nest IDs and those associated with commercial music service providers.

In the near future we’ll be rolling out support to the various clients (pyechonest and the Java client API) to support Rosetta Stone.

As always, we love any feedback and suggestions to make writing music apps easier. So email me (paul@echonest.com) or leave a comment here.

project rosetta stone, The Echo Nest

This entry was posted on February 10, 2010, 3:45 pm and is filed under code, Music, The Echo Nest, web services. You can follow any responses to this entry through RSS 2.0. Both comments and pings are currently closed.

#1 by Ben on February 11, 2010 - 12:06 pm

This is excellent news! Though I must admit I wish it was up during the Stockholm Music Hack Day. That said, Mike and I will do a bit of reconfiguring to use this call instead of the chained calls via pyMBZ…
#2 by Tristan on February 12, 2010 - 5:56 am

Very useful. Will you be opening up the ID mapping service as a separate thing or is it just supporting the ability to use non-Echonest IDs in Echonest API calls?
#3 by Paul on February 12, 2010 - 7:39 am

Tristan:

You’ll be able use our get_profile call as a mapping service. Give the method an ID in any supported name space and ask it to return IDs in other name spaces.

Paul
#4 by Lucas Gonze on February 12, 2010 - 2:19 pm

Really good idea for a service. It makes sense for the Echonest mission, and it’s a valuable resource for many different companies. It makes zero sense for Myspace/Yahoo/ad infinitum to build the mapper for themselves.
#5 by JustSomeOldJoe on March 7, 2010 - 1:58 am

Finally someone in the online music biz gets a clue and works on defining an external ID mapping structure and capturing said mapping data. Should work well with well defined and structured data sources i.e. Musicbrainz. However, could be a bitch when it comes time to implement “commercial services.” Most of the commercial service back end implementations I’ve seen wouldn’t know consistency if it hit ’em upside the head.

It’s ridiculous that nothing exists for music akin to the Library of Congress LC# system used for written works.

Good luck to you!