Posts Tagged ismir2009

10 Awesome things about ISMIR 2009

ISMIR 2009 is over – but it will not be soon forgotten.  It was a wonderful event, with seemingly flawless execution.  Some of my favorite things about the conference this year:

  1. The proceedings – distributed on a USB stick hidden in a pen that has a laser! And the battery for the laser recharges when you plug the USB stick into your computer.  How awesome is that!?  (The printed version is very nice too, but it doesn’t have a laser).
  2. The hotel – very luxurious while at the same time, very affordable.  I had a wonderful view of Kobe, two very comfortable beds and a toilet with more controls than the dashboard on my first car.
  3. The presentation room – very comfortable with tables for those sitting towards the front, great audio and video and plenty of power and wireless for all.
  4. The banquet – held in the most beautiful room in the world with very exciting Taiko drumming as entertainment.
  5. The details – it seems like the organizing team paid attention to every little detail and request – they had taped numbers on the floor so that the 30 folks giving their 30 second pitches during poster madness would know just where to stand, to the signs on the coffeepots telling you that the coffee was being made, to the signs on the train to the conference center welcoming us to ISMIR 2009.  It seems like no detail was left to chance.
  6. The food – our stomachs were kept quite happy – with sweet breads and pastries every morning,  bento boxes for lunch, and coffee, juices, waters, and the  mysterious beverage ‘black’ that I didn’t dare to try. My absolute favorite meal was the box lunch during the tutorial day – it was a box with a string – when you are ready to eat you give the string a sharp tug – wait a few minutes for the magic to do its job and then you open the box and eat a piping hot bowl of noodles and vegetables.  Almost as cool as the laser-augmented proceedings.
  7. The city – Kobe is a really interesting city – I spent a few days walking around and was fascinated by it all. I really felt like I was walking around in the future.  It was extremely clean, the people will very polite, friendly and always willing to help.  Going into some parts of town was sensory overload, the colors, sounds, smells, the sights were overwhelming – it was really fun.
  8. the Keynote – music making robots – what more is there to say.
  9. The Program – the quality of papers was very high – there was some outstanding posters and oral presentations.  Much thanks to George and Keiji for organizing the reviews to create a great program. (More on my favorite posters and papers in an upcoming post)
  10. f(mir) – The student-organized workshop looked at what MIR research would look like in 10, 20 or even 50 years (basically after I’m dead and gone). The presentations in this workshop were quite provactive – well done students!

I write this post as I sit in the airport in Osaka waiting for my flight home.  I’m tired, but very energized to explore the many new ideas that I encountered at the conference. It was a great week.  I want to extend my personal thanks to Professor Fujinaga and Professor Goto and the rest of the conference committee for putting together a wonderful week.

Masataka and Ichiro at the conference table



ISMIR – MIREX Panel Discussion

Stephen Downie presents the MIREX session

Statistics for 2009:

  • 26 tasks
  • 138 participants
  • 289 evaluation runs

Results are now published:

This year, new datasets:

  • Mazurkas
  • MIR 1K
  • Back Chorales
  • Chord and Segmentation datasets
  • Mood dataset
  • Tag-a-Tune

Evalutron 6K – Human evaluations – this year, 50 graders / 7500 possible grading events.

What’s Next?

Issues about MIREX

  • Rein in the parameter explosion
  • Not rigorously tested algorithms
  • Hard-coded parameters, path-separators, etc
  • Poorly specified data inputs/outputs
  • Dynamically linked libraries
  • Windows submissions
  • Pre-compiled Matlab/MEX Submissions
  • The ‘graduation’ problem – Andreas and Cameron will be gone in summer.

Long discussion with people opining about tests, data.  Ben Fields had a particularly good point about trying to make MIREX  better reflect real systems that draw upon web resources.



Leave a comment

ISMIR Poster Madness #3


Leave a comment

ISMIR Oral Session 4 – Music Recommendation and playlisting

Music Recommendation and playlisting

Session Chair:  Douglas Turnbull


by Kazuyoshi Yoshii and Masataka Goto

  • Unexpected encounters with unknown songs is increasingly important.
  • Want accurate and diversifed recommendations
  • Use a probabilistic approach suitable to deal with uncertainty of rating histories
  • Compares CF vs. content-based and his Hybrid filtering system

Approach: Use PLSI to create a  3-way aspect model: user-song-feature – the unobservable category regading genre, tempo, vocal age, popularity etc.  – pLSI typical patterns are given by relationships between users, songs and a limited number of topics.  Some drawbacks: PLSI needs discrete features, multinomial distributions are assumed.  To deal with this formulate continuous pLSI, use gaussian mixture models and can assume continuous distributions.  A drawback of continuous pLSI – local minimum problem and the hub problem.  Popular songs are recommended often because of the hubs.  How to deal with this: Gaussian parameter tying – this reduces the number of free parameters. Only the mixture weights vary. Artist-based song clustering: Train an artist-based model and update it to a song-based model by an incremental training method (from 2007).

Here’s the system model:

ismir2009-proceedings.pdf (page 347 of 775)

Evaluation: They found that using the  techniques to adjust model complexity significantly improved the accuracy of recommendations and that the second technique could also reduce hubness.


François Maillet, Douglas Eck, Guillaume Desjardins, Paul Lamere

This paper presents an approach to generating steerable playlists. They first demonstrate a method for learning song transition probabilities from audio features extracted from songs played in professional radio station playlists and then show that by using this learnt similarity function as a prior, they are able to generate steerable playlists by choosing the next song to play not simply based on that prior, but on a tag cloud that the user is able to manipulate to express the high-level characteristics of the music he wishes to listen to.

  • Learn a similarity space from commercial radion staion playlists
  • generate steerable playlists

Francois defines a playlist. Data sources: Radio Paradise and’s API.  7million tracks,

Problem:  They had positive examples but didn’t have an explicit set of negative examples.   Chose them at random.

Learning the song space: Trained a binary classifier to determine if a song sequence is real.

Features: Timbre, Rhythmic/dancability, loudness


ismir2009-proceedings.pdf (page 358 of 775)


Klaas Bosteels, Elias Pampalk, Etienne Kerr

Abstract: In this paper, we analyse and evaluate several heuristics for adding songs to a dynamically generated playlist. We explain how radio logs can be used for evaluating such heuristics, and show that formalizing the heuristics using fuzzy set theory simplifies the analysis. More concretely, we verify previous results by means of a large scale evaluation based on 1.26 million listening patterns extracted from radio logs, and explain why some heuristics perform better than others by analysing their formal definitions and conducting additional evaluations.


  • Dynamic playlist generation
  • Formalization using fuzzy sets. Sets of accepted songs and sets of rejected songs
  • Why last two songs not accepted? To make sure the listener is still paying attention?
  • Interesting observation that the thing that matters most is membership in the fuzzy set of rejected songs. Why? Inconsistent skipping behavior.

ismir2009-proceedings.pdf (page 361 of 775)


Luke Barrington, Reid Oda, Gert Lanckriet

Abstract: Genius is a popular commercial music recommender sys- tem that is based on collaborative filtering of huge amounts of user data. To understand the aspects of music similarity that collaborative filtering can capture, we compare Genius to two canonical music recommender systems: one based purely on artist similarity, the other purely on similarity of acoustic content. We evaluate this comparison with a user study of 185 subjects. Overall, Genius produces the best recommendations. We demonstrate that collaborative filter- ing can actually capture similarities between the acoustic content of songs. However, when evaluators can see the names of the recommended songs and artists, we find that artist similarity can account for the performance of Genius. A system that combines these musical cues could generate music recommendations that are as good as Genius, even when collaborative filtering data is unavailable.

Great talk, lots of things to think about.

ismir2009-proceedings.pdf (page 370 of 775)


Leave a comment

ISMIR Day 1 Posters

Click for slide show

Click for slide show

Lots of very interesting posters, you can see some of my favorites in this Flickr slide show.

, ,

Leave a comment

ISMIR Oral Session 3 – Musical Instrument Recognition and Multipitch Detection

Session Chair: Juan Pablo Bello


By Ferdinand Fuhrmann, Martín Haro, Perfecto Herrera

  • Automatic recognition of music instruments
  • Polyphonic music
  • Predominate

Research Questions

  • scale existing methods to higlh ployphonci muci
  • generalize in respect to used intstruments
  • model temporal information for recognition


  • Unifed framework
  • Pitched and unpitched …
  • (more goals but I couldn’t keep up_

Neat presentation of survey of related work, plotting on simple vs. complex

Ferdinand was going too fast for me (or perhaps jetlag was kicking in), so I include the conclusion from his paper here to summarize the work:

Conclusions: In this paper we addressed three open gaps in automatic recognition of instruments from polyphonic audio. First we showed that by providing extensive, well designed data- sets, statistical models are scalable to commercially avail- able polyphonic music. Second, to account for instrument generality, we presented a consistent methodology for the recognition of 11 pitched and 3 percussive instruments in the main western genres classical, jazz and pop/rock. Fi- nally, we examined the importance and modeling accuracy of temporal characteristics in combination with statistical models. Thereby we showed that modelling the temporal behaviour of raw audio features improves recognition per- formance, even though a detailed modelling is not possible. Results showed an average classification accuracy of 63% and 78% for the pitched and percussive recognition task, respectively. Although no complete system was presented, the developed algorithms could be easily incorporated into a robust recognition tool, able to index unseen data or label query songs according to the instrumentation.

ismir2009-proceedings.pdf (page 332 of 775)


by Toni Heittola, Anssi Klapuri and Tuomas Virtanen

Quick summary: A novel approach to musical instrument recognition in polyphonic audio signals by using a source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. The mixture signal is decomposed into a sum of spectral bases modeled as a product of excitations and filters. The excitations are restricted to harmonic spectra and their fundamental frequencies are estimated in advance using a multipitch estimator, whereas the filters are restricted to have smooth frequency responses by modeling them as a sum of elementary functions on the Mel-frequency scale. The pitch and timbre information are used in organizing individual notes into sound sources. The method is evaluated with polyphonic signals, randomly generated from 19 instrument classes.

Source separation into various sources.  Typically uses non-negative matrix factorization.  Problem: Each pitch needs its own function leading to many functions.    The system overview:

ismir2009-proceedings.pdf (page 336 of 775)

The Examples are very interesting:


by Zhiyao Duan, Jinyu Han and Bryan Pardo

A novel system for multipitch tracking, i.e. estimate the pitch trajectory of each monophonic source in a mixture of harmonic sounds.  Current systems are not robust, since they use local time-frequencies, they tend to generate only short pitch trajectories.  This  system has two stages: multi-pitch estimation and pitch trajectory formation. In the first stage,  they model spectral peaks and non-peak regions to estimate pitches and polyphony in each single frame. In the second stage,  pitch trajectories are clustered following some constraints: global timbre consistency, local time-frequency locality.

Here’s the system overview:

ismir2009-proceedings.pdf (page 342 of 775)Good talk and paper. Nice results.


Leave a comment

ISMIR Poster Madness part 2

Poster madness! Version 2 – even faster this time. I can’t keep up

  1. Singing Pitch Extraction – Taiwan
  2. Usability Evaluation of Visualization interfaces for content-based music retrieval – looks really cool! 3D
  3. Music Paste – concatenating music clipbs based on chroma and rhythm features
  4. Musical bass-line pattern clustering and its application aduio gener classification
  5. Detecting cover sets – looks nice – visualization – MTG
  6. Using Musical Structure to enhance automatic chord transcription –
  7. Visualizing Musical Structure from performance gesture – motion
  8. From low-level to song-level percussion descriptors of polyphonic music
  9. MTG – Query by symbolic example – use a DNA/Blast type approach
  10. sten – web-based approach to determine the origin of an artist – visualizations
  11. XML-format for any kind of time related symbolic data
  12. Erik Schmidt – FPGA feature extraction. MIR for devices
  13. Accelerating QBH – another hardware solution – 160 times faster
  14. Learning to control a reverberator using subjective perceptual descriptors –  more boomy
  15. Interactive GTTM Analyzer –
  16. Estimating the error distribution of a tap sequence without ground Truth – Roger Dannenburg
  17. Cory McKay – ACE XML – Standard formats for features, metadata, labels  and class ontologies
  18. An efficient multi-resolution spectral transform for music analysis
  19. Evaluation of multiple F0 estimation and tracking systems

BTW – Oscar informs me that this is not the first ever poster madness – there was one in Barcelona


Leave a comment

ISMIR Oral Session 2 – Tempo and Rhythm

Session chair: Anssi Klapuri


By Marthias Gruhne, Christian Dittmar, and Daniel Gaertner

Marthias described their approach to generating beat histogram techniques, similar to those used by Burred, Gouyun, Foote and Tzanetakis. Problem: beat histogram can not be directly used as feature because of tempo dependency.  Similar rhythms appear far apart in a Euclidean space because of this dependency. Challenge: reduce tempo dependence.

Solution: logarithmic Transformation.  See the figure:

ismir2009-proceedings.pdf (page 186 of 775)

This leads to a histogram with a tempo independent part which can be separated from the tempo dependent part.  This tempo independent part can then be used in a Euclidean space to find similar rhythms.

Evaluation: results 20% to 70%, and from 66% to 69%  (Needs a significance test here I think)


By Parag Chordia and Alex Rae – presented by George Tzanetakis

Well, this is unusual that George will be presenting Para and Alex’s work.  Anssi suggests that we can use the wisdom of the crowds to anser the questions.

Motivation: Tempo detection is often unreliable for complex music.

Humans often resolve rhythms by entraining to a rhythmical regular part.

Idea: Separate music into components, some components may be more reliable.


  1. Source separation
  2. track tempo for each source
  3. decide global tempo by either:
    1. Pick one with most regular structure
    2. Look for common tempo across all sources/layers

Here’s the system:

ismir2009-proceedings.pdf (page 193 of 775)

PLCA is a source separation method (Probablistic Latent Component Analysis).  Issues: Number of components need to be specified in advance.  Could merge sources or one source could be split into multiple layers.

Autocorrelation is used for tempo detection.  Regular sources will have higher peaks.

Other approach – a machine learning approach – a supervised learning problem

Global Tempo using Clustering – merge all tempo candidates into single vector (and others within a 5% tolerance (and .5x and 2x), to give a peak histogram showing confidence for each tempo.


MIREX06: 0.50
THIS   : 0.60

Question: How many sources were specified to PLCA, Answer: 8. George thinks it doesn’t matter too much.

Question: Other papers show that similar techniques do not show improvement for larger datasets


By Peter Grosche and Meinard Müller

Example – a waltz – where the downbeat is not too strong compared to beats 2 & 3.   It is hard to find onsets in the energy curves.  Instead, use:

  1. Create a spectogram
  2. Log compression of the spectrogram
  3. Derivative
  4. Accumulation

This yields a novelty curve, which can be used for onset detection.  Downbeats are missing. How to beat track this? compute tempogram – a spectrogram of the novelty curve.  This yields a periodicity kernel.  All kernels are combined to obtain a single kernel – rectified – this gives a predominate local pulse curve. The PLP curve is dynamic but can be constrained to track at the bar, beat or tatum level.

ismir2009-proceedings.pdf (page 201 of 775)

Issues: PLP likes to fill in the gaps – which is not always appropriate.  Trouble with the Borodin String Quartet No. 2. But when tempo is tightly constrained, it works much better.

This was a very good talk. Meinard presented lots of examples including examples where the system did not work well.

Question:  Realtime? Currently kernels are 4 to 6 seconds. With a latency of 4 to 6 seconds it should work in an online scenario.

Question: How different from DTW on the tempogram?  Not connected to DTW in anyway.

Question: How important is the hopsize? Not that important since a sliding window is used.


Leave a comment

ISMIR Keynote – Ten Years of Ismir

Ten years of ISMIR – Reflections on challenges and opportunities

Three founding fathers of ISMIR: J. Stephen Downie, Donald Byrd and Tim Crawford

Prehistoric background Tim Crawford described the early challenges in finding music and difficulties in using computers in the early days.  Stephen talks about his early days as a flutist and his personal challenges in finding music from incipits and his strong bias for retrieval and evaluation instilled by his advisor.

In 1999, Don Byrd, working with Bruce Croft at UMASS, Tim working at Kings College in London got a grant to look at music information retrieval.

1999 – two conferences – ACM SIGIR + ACM Digital Libraries. Stephen organized an exploratory workshop on music information retrieval – that’s where Stephen and Don met and proposed ISMIR.  13 August 1999 – ISMIR was born (in a bar in Berkley CA.)

ISMIR timeline:

  • 1999 – MIR Workshop
  • 2000 – First ISMIR – Plymouth MA – 88 Attendees, 11 different countries, 9 invited talks, 10 papers, 16 posters.  Highlights: Beth Logan presents MFCCs, Tzanetakis and Cook: Marsyas, Foote: Arthur paper
  • 2002 – switch symposium to conference – to make it easier to get funding
  • Growing collaborations.  50% of all papers are from 3 or 4 authors
  • 2009 – ISMIR becomes the  International Society of Music Information Retrieval

Evaluation History

  • 1999 Trec-like evaluation proposed
  • 2001 Bloomington meeting – manifesto for content providers to supply data
  • 2002 / 2003 – funding from the Mellon corporation
  • 2004 – Barcelona – MTG created the audio description contest
  • 2005 – First MIREX
  • MIREX Breakdown
    • 469 algorithm runs
    • 129 – train/test machine learning tests
    • 139 search tasks
    • 22 unique tasks
    • 16 tasks in audio domain
    • 3 hybrid tasks
    • No symbolic tasks in 2009

ISMIR: External Success Factors – Audo Compression, growth of online audio, Standards like MPEG-7, bubble (Google for music)

ISMIR: Internal Success Factors: – Communications resources – the mailing list and collected proceedings.   Diversity in backgrouns in the steering committee, quality in programme chairs and committees.  Policy of inclusiveness – not premised on high rejection rates, multiple avenues for presntation.   General support for the Audio Description Contest and MIREX.

Five Key Challenges for ISMIR

  1. Embracing Users – engage more with potential user-communities (performing musicians, film makers, musicologists, sound archivists, music eduatiors and music enthusiasts of all types)
  2. Digging deeper into music itself– find the ‘music’ within the signal, move beyond simple timbral approaches,  move beyond simgle features to create hybrid musically principled features, deeper understanding of what features mean musically, hybrid symbol and audio systems.
  3. Increasing musial diversity – widen our horizons beyond western popular music
  4. Rebalancing our music portfolio – Use audio ‘symbolic’ and (catalog) metadata together
  5. Developing comprehensive MIR systems: work towards complete, usable, scalable systems, even if they are not perfect.  In text IR world, prototype systems have been pivotal (smart, managing gigabytes, terrier

The Grand Challenge:  Complete Systems

  • Something for people to use
  • Engage with our potential user-community
  • Users and humans music become more aware how humans hear music, listen to music respond to it and think about it.
  • New discipline of music informatics based in higher-level (human) query rather than low-level feature-matching


  • Need to find a way to encourage and reward development and improvement – to move things to the next level – problem is it is hard to publish something that is built on previous work but has no novel contribution
  • Academic vs. industrial priorities
  • Music retrieval vs multimedia retrieval we have a lot to learn from conferences like  ACM MM.

IMPACT! – is the new academic Rock’n’roll – to get funding, must show impact, perhaps more important than publishing.


Leave a comment

Live from ISMIR

This week I’m attending ISMIR – the 10th International Society for Music Information Retrieval Conference  being held in Kobe Japan.  At this conference researchers gather to advance the state of the art in music information retrieval.  It is a varied bunch  including librarians, musicologists,  experts in signal processing, machine learning, text IR, visualization, HCI.  I’ll be trying to blog the various talks and poster sessions throughout the conference, (but at some point the jetlag will kick in – making it hard for me to think,  let alone type.  It’s 9AM – the keynote is starting …

Opening Remarks

Masataka and Ich give the opening remarks. First some stats:

  • 286 attendees from 28 countries
  • 212 submissions fro 29 countries
  • 123 papers (58%) accepted
  • 214 reviewers





Leave a comment