Last week, I ventured to Gothenburg Sweden to participate in the Way Out West Hack 2 – a music-oriented hackathon associated with the Way Out West Music Festival.
I was, of course, representing and supporting The Echo Nest API during the hack, but I also put together my own Echo Nest-based hack: The Saddest Stylophone. The hack creates an auto accompaniment for just about any song played on the Stylophone – an analog synthesizer toy created in the 60s that you play with a stylus.
Two hacking pivots on the way … The road to the Saddest Stylophone was by no means a straight line. In fact, when I arrived at #wowhack2 I had in mind a very different hack – but after the first hour at the hackathon it became clear that the WIFI at the event was going to be sketchy at best, and it was going to be very slow going for any hack (including the hack I had planned) that was going to need zippy access to to the web, and so after an hour I shelved that idea for another hack day. The next idea was to see if I could use the Echo Nest analysis data to convert any song to an 8-bit chiptune version. This is not new ground, Brian McFee had a go at this back at the 2012 MIT Music Hack Day. I thought it would be interesting to try a different approach and use an off-the-shelf 8bit software synth and the Echo Nest pitch data. My intention was to use a Javascript sound engine called jsfx to generate the audio. It seemed like it pretty straightforward way to create authentic 8bit sounds. In small doses jsfx worked great, but when I started to create sequences of overlapping sounds my browser would crash. Every time. After spending a few hours trying to figure out a way to get jsfx to work reliably, I had to abandon jsfx. It just wasn’t designed to generate lots of short overlapping and simultaneous sounds, and so I spent some time looking for another synthesizer. I finally settled on timbre.js. Timbre.js seemed like a fully featured synth. Anyone with a Csound background
would be comfortable with creating sounds with Timbre.js It did not take long before I was generating tones that were tracking the melody and chord changes of a song. My plan was to create a set of tone generators, and dynamically control the dynamics envelope based upon the Echo Nest segment data. This is when I hit my next roadblock. The timbre.js docs are pretty good, but I just couldn’t find out how to dynamically adjust parameters such as the ADSR table. I’m sure there’s a way to do it, but when there’s only 12 hours left in a 24 hour hackathon, the two hours spent looking through JS library source seemed like forever, and I began to think that I’d not figure out how to get fine grained control over the synth. I was pretty happy with how well I was able to track a song and play along with it, but without ADSR control or even simple control over dynamics the output sounded pretty crappy. In fact I hadn’t heard anything that sounded so bad since I heard @skattyadz
play a tune on his Stylophone at the Midem Music Hack Day earlier this year. That thought turned out to be the best observation I had during the hackathon. I could hide all of my troubles trying to get a good sounding output by declaring that my hack was a Stylophone simulator. Just like a Stylophone, my app would not be capable of playing multiple tones at once, it would not have complex changes in dynamics, it would only have a one and half octave range, it would not even have a pleasing tone. All I’d need to do would be to convincingly track a melody or harmonic line in a song and I’d be successful. And so, after my third pivot, I finally had a hack that I felt I’d be able to finish in time for the demo session and not embarrass myself. I was quite pleased with the results.
How does it work? The Sad Stylophone takes advantage of the Echo Nest detailed analysis. The analysis provides detailed information about a song. It includes information about where all the bars and beats are, and includes a very detailed map of the segments of a song. Segments are typically small, somewhat homogenous audio snippets in a song, corresponding to musical events (like a strummed chord on the guitar or a brass hit from the band).
A single segment contains detailed information on the pitch, timbre, loudness. For pitch it contains a vector of 12 floating point values that correspond to the amount of energy at each of the notes in the 12-note western scale. Here’s a graphic representation of a single segment:
This graphic shows the pitch vector, the timbre vector, the loudness, confidence and duration of a segment.
The Saddest Stylophone only uses the pitch, duration and confidence data from each segment. First, it filters segments to combine short, low confidence segments with higher confidence segments. Next it filters out segments that don’t have a predominant frequency component in the pitch vector. Then for each surviving segment, it picks the strongest of the 12 pitch bins and maps that pitch to a note on the Stylophone. Since the Stylophone supports an octave and a half (20 notes), we need to map 12 notes onto 20 notes. We do this by unfolding the 12 bins by reducing inter-note jumps to less than half an octave when possible. For example, if between segment one and segment two we would jump 8 notes higher, we instead check to see if it would be possible to jump to 4 notes lower instead (which would be an octave lower than segment two) while still remaining within the Stylophone range. If so, we replace the upward long jump with the downward, shorter jump. The result of this a list of notes and timings mapped on to the 20 notes of the Stylophone. We then map the note onto the proper frequency and key position – the rest is just playing the note via timbre.js at the proper time in sync with the original audio track and animating the stylus using Raphael.
I’ve upgraded the app to include an Under the hood selection that, when clicked opens up a visualization that shows the detailed info for a segment, so you can follow along and see how each segment is mapped onto a note. You can interact with visualization, stepping through the segments, and auditioning and visualizing them.
That’t the story of the Saddest Stylophone – it was not the hack I thought I was going to make when I got to #wowhack – but I was pleased with the result, when The Sad Stylophone plays well, it really can make any song sound sadder and more pathetic. Its a win. I’m not the only one – wired.co.uk listed it as one of the five best hacks at the hackathon.
Give it a try at Saddest Stylophone.