Understanding Features and Distance Functions for Music Sequence Alignment – Ozgur Izmirli and Roger Dannenberg
ABSTRACT We investigate the problem of matching symbolic representations directly to audio based representations for applications that use data from both domains. One such application is score alignment, which aligns a sequence of frames based on features such as chroma vectors and distance functions such as Euclidean distance. Good representations are critical, yet current systems use ad hoc constructions such as the chromagram that have been shown to work quite well. We investigate ways to learn chromagram-like representations that optimize the classification of “matching” vs. “non-matching” frame pairs of audio and MIDI. New representations learned automatically from examples not only perform better than the chromagram representation but they also reveal interesting projection structures that differ distinctly from the traditional chromagram.
Roger and Ozgur present a method for learning features for score alignment. They bypass the traditional chromagram feature with a feature that is learned projection of the audio spectrum. Results show that the new features work better than chroma.