I’ve just pushed out a new version of my SXSW Artist Catalog that lets you add any artist to your SXSW schedule (via sched.org). Each artist now has a ‘schedule at sched.org’ link which brings you directly to the sched.org page for the artist where you can select the artist event that you are interested in and then add it to your schedule. It is pretty handy.
By the way, the integration with sched.org could not have been easier. Taylor McKnight added a search url of the form:
http://sxsw2009.sched.org/?searchword=DEVO
that brings you to the DEVO page at sched.org. Very nice.
While adding the sched support, I also did a recrawl of all the artist info, so the data should be pretty fresh.
Thanks to Steve for fixing things for me after I had botched things up on the deploy, and thanks in general to Sun for continuing to host the catalog.
By the way, doing this update was a bit of a nightmare. The key data for the guide is the artist list that is crawled from the SXSW site – but the SXSW folks have recently changed the format of the artist list (spreading it out over multiple pages, adding more context, etc ). I didn’t want to have to rewrite the parsing code (when working on a spare time project, just the thought of working with regular expressions makes me close the IDE and fire up Team Fortress 2). Luckily, I had anticipated this event – my SXSW crawler had diligently been creating archives of every SXSW crawl, so if they did change formats, I could fall back on a previous crawl without needing to work on the parser. I’m so smart. Except that I had a bug. Here’s the archive code:
public void createArchive(URL url) throws IOException { createArchiveDir(); File file = new File(getArchiveName()); if (!file.exists()) { URLConnection connection = url.openConnection(); BufferedReader in = new BufferedReader( newInputStreamReader(connection.getInputStream())); PrintWriter out = new PrintWriter(getArchiveName()); String line = null; try { while ((line = in.readLine()) != null) { out.println(line); } } finally { in.close(); } }
See the bug? Yep, I forgot to close the output file – which means that all of my many archive files were missing the last block of data, making them useless. My pennance for this code-and-test sin was that I had to go and rewrite the SXSW parser to support the new format. But this turned out to be a good thing, since SXSW has been adding more artists. So this push has a new fresh crawl, with the absolute latest artists, fresh data from all of the sites like Youtube, Flicker, Last.fm and The Echo Nest. My bug makes more work for me, but a better catalog for you.
#1 by robert on March 1, 2009 - 12:17 pm
Now you need to add in sched’s list of most popular events to your charts for even more statacular stats! :)