ISMIR Oral Session 2 – Tempo and Rhythm

Session chair: Anssi Klapuri

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

By Marthias Gruhne, Christian Dittmar, and Daniel Gaertner

Marthias described their approach to generating beat histogram techniques, similar to those used by Burred, Gouyun, Foote and Tzanetakis. Problem: beat histogram can not be directly used as feature because of tempo dependency.  Similar rhythms appear far apart in a Euclidean space because of this dependency. Challenge: reduce tempo dependence.

Solution: logarithmic Transformation.  See the figure:

ismir2009-proceedings.pdf (page 186 of 775)

This leads to a histogram with a tempo independent part which can be separated from the tempo dependent part.  This tempo independent part can then be used in a Euclidean space to find similar rhythms.

Evaluation: results 20% to 70%, and from 66% to 69%  (Needs a significance test here I think)

USING SOURCE SEPARATION TO IMPROVE TEMPO DETECTION

By Parag Chordia and Alex Rae – presented by George Tzanetakis

Well, this is unusual that George will be presenting Para and Alex’s work.  Anssi suggests that we can use the wisdom of the crowds to anser the questions.

Motivation: Tempo detection is often unreliable for complex music.

Humans often resolve rhythms by entraining to a rhythmical regular part.

Idea: Separate music into components, some components may be more reliable.

Method:

  1. Source separation
  2. track tempo for each source
  3. decide global tempo by either:
    1. Pick one with most regular structure
    2. Look for common tempo across all sources/layers

Here’s the system:

ismir2009-proceedings.pdf (page 193 of 775)

PLCA is a source separation method (Probablistic Latent Component Analysis).  Issues: Number of components need to be specified in advance.  Could merge sources or one source could be split into multiple layers.

Autocorrelation is used for tempo detection.  Regular sources will have higher peaks.

Other approach – a machine learning approach – a supervised learning problem

Global Tempo using Clustering – merge all tempo candidates into single vector (and others within a 5% tolerance (and .5x and 2x), to give a peak histogram showing confidence for each tempo.

Evaluation

Accuracy:
MIREX06: 0.50
THIS   : 0.60

Question: How many sources were specified to PLCA, Answer: 8. George thinks it doesn’t matter too much.

Question: Other papers show that similar techniques do not show improvement for larger datasets

A MID-LEVEL REPRESENTATION FOR CAPTURING DOMINANT TEMPO AND PULSE INFORMATION IN MUSIC RECORDINGS

By Peter Grosche and Meinard Müller

Example – a waltz – where the downbeat is not too strong compared to beats 2 & 3.   It is hard to find onsets in the energy curves.  Instead, use:

  1. Create a spectogram
  2. Log compression of the spectrogram
  3. Derivative
  4. Accumulation

This yields a novelty curve, which can be used for onset detection.  Downbeats are missing. How to beat track this? compute tempogram – a spectrogram of the novelty curve.  This yields a periodicity kernel.  All kernels are combined to obtain a single kernel – rectified – this gives a predominate local pulse curve. The PLP curve is dynamic but can be constrained to track at the bar, beat or tatum level.

ismir2009-proceedings.pdf (page 201 of 775)

Issues: PLP likes to fill in the gaps – which is not always appropriate.  Trouble with the Borodin String Quartet No. 2. But when tempo is tightly constrained, it works much better.

This was a very good talk. Meinard presented lots of examples including examples where the system did not work well.

Question:  Realtime? Currently kernels are 4 to 6 seconds. With a latency of 4 to 6 seconds it should work in an online scenario.

Question: How different from DTW on the tempogram?  Not connected to DTW in anyway.

Question: How important is the hopsize? Not that important since a sliding window is used.

,

%d bloggers like this: