I’m on my way to Outside Hacks – a hackathon tied in with the Outside Lands music festival. Since many hacks at the hackathon will be related to the festival it is pretty important to have a machine-readable version of the artist lineup for the festival. However, I couldn’t find any online. Since I had an hour in the airport lounge, and the airport actually has decent WiFi, I thought I would try to be a good hacker citizen and generate an easily parseable lineup.
A little python + some BeautifulSoup and a bit of Echo Nest Rosetta Data and I have created an Outside Lands lineup JSON that includes links to artist pages, plus Echo Nest, Spotify and Rdio IDs. The JSON is hosted online at:
http://static.echonest.com/OutsideLands/lineup_2014.json
Here’s the code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sys | |
from bs4 import BeautifulSoup | |
import json | |
import pyen | |
en = pyen.Pyen() | |
def get_fid(artist, idspace): | |
if 'foreign_ids' in artist and len(artist['foreign_ids']) > 0: | |
for fids in artist['foreign_ids']: | |
if fids['catalog'] == idspace: | |
return fids['foreign_id'] | |
return None | |
def en_artist_lookup(name): | |
response = en.get('artist/search', name=name, | |
bucket=['id:spotify', 'id:rdio-US']) | |
artists = response['artists'] | |
if len(artists) > 0: | |
artist = artists[0] | |
print artist | |
enid = artist['id'] | |
spid = get_fid(artist, 'spotify') | |
rdio = get_fid(artist, 'rdio-US') | |
if rdio: | |
rdio = rdio.split(':')[2] | |
print name, '/', artist['name'], enid, spid, rdio | |
ids = { | |
'echonest':enid, | |
'spotify':spid, | |
'rdio':rdio | |
} | |
return ids | |
return {} | |
if __name__ == '__main__': | |
lineup = [] | |
f = open(sys.argv[1]) | |
html_doc = f.read() | |
soup = BeautifulSoup(html_doc) | |
for a in soup.find_all('a', class_='band'): | |
name = a.text.strip() | |
ids = en_artist_lookup(name) | |
artist = { | |
'artist' : name, | |
'link' : a['href'], | |
'ids' : ids | |
} | |
lineup.append(artist) | |
out = open('lineup_2014.json', 'w') | |
print >> out, json.dumps(lineup, indent=4) | |
out.close() |
It’s about time to get on the plane. If you can think of other interesting data to add to the lineup json let me know and I’ll try to add it before the hackathon.