One of the biggest pain points users have with the Echo Nest developer API is with the track upload method. This method lets you upload a track for analysis (which can be subsequently retrieved by a number of other API method calls such as get_beats, get_key, get_loudness and so on). The track upload, unlike all of the other of The Echo Nest methods requires you to construct a multipart/form-data post request. Since I get a lot of questions about track upload I decided that I needed to actually code my own to get a full understanding of how to do it – so that (1) I could answer detailed questions about the process and (2) point to my code as an example of how to do it. I could have used a library (such as the Jakarta http client library) to do the heavy lifting but I wouldn’t have learned a thing nor would I have some code to point people at. So I wrote some Java code (part of the forthcoming Java Client for the Echo Nest web services) that will do the upload.
You can take a look at this post method in its google-code repository. The tricky bits about the multipart/form-data post is getting the multip-part form boundaries just right. There’s a little dance one has to do with the proper carriage returns and linefeeds, and double-dash prefixes and double-dash suffixes and random boundary strings. Debugging can be a pain in the neck too, because if you get it wrong, typically the only diagnostic one gets is a ‘500 error’ which means something bad happened.
Track upload can also be a pain in the neck because you need to wait 10 or 20 seconds for the track upload to finish and for the track analysis to complete. This time can be quite problematic if you have thousands of tracks to analyze. 20 seconds * one thousand tracks is about 8 hours. No one wants to wait that long to analyze a music collection. However, it is possible to short circuit this analysis. You can skip the upload entirely if we already have performed an analysis on your track of interest. To see if an analysis of a track is already available you can perform a query such as ‘get_duration’ using the MD5 hash of the audio file. If you get a result back then we’ve already done the analysis and you can skip the upload and just use the MD5 hash of your track as the ID for all of your queries. With all of the apps out there using the track analysis API, (for instance, in just a week, donkdj has already analyzed over 30K tracks) our database of pre-cooked analyses is getting quite large – soon I suspect that you won’t need to perform an upload of most tracks (certainly not mainstream tracks). We will already have the data.
#1 by Andy Baio on April 4, 2009 - 1:26 pm
How about providing a URL to a track instead?
#2 by plamere on April 4, 2009 - 1:29 pm
Good point, I should have mentioned in my blog post that our track upload supports both uploading from a file (as described) and uploading from a URL. Uploading from a URL is much easier and less of a burden.
#3 by Ed on April 5, 2009 - 2:07 pm
Why is that quotation mark doing at the end of this line?
dos.writeBytes(“Content-Type: application/octet-stream” + “\”” );
#4 by plamere on April 5, 2009 - 6:56 pm
that looks like a bug. Thanks!
#5 by Ben Bennett on April 9, 2009 - 2:13 pm
Unfortunately the MD5 hash bit is a little less useful since it appears you take the hash of the whole file… ID3v1, ID3v2, etc. tags and all. So anything that touches a tag or fiddles the track metadata will cause an MD5 change.
It would be better if people took the MD5 hash of just the MP3 payload (after stripping out everything else) and sent up the bare MP3s for analysis… But I suspect that is not what people are doing.
#6 by Reid Draper on April 11, 2009 - 1:12 pm
Any plans to eventually to eventually be able to make an analysis request with meta-data or a MusicBrainz ID? As Ben pointed out, slightly different meta-data or a different compression for the mp3 will cause a different hash.
#7 by plamere on April 11, 2009 - 1:16 pm
Reid – yeah, we shall be trying improve things here.. stay tuned.