ISMIR 2009 – The Future of MIR

This year ISMIR concludes with the 1st Workshop on the Future of MIR.  The workshop is organized by students who are indeed the future of MIR.

banner2

09:00-10:00 Special Session: 1st Workshop on the Future of MIR

The PDF files of the papers in this special session are available at the f(MIR) official website. Welcome and Introduction to the f(MIR) workshop Thierry Bertin-Mahieux

MIR, where we are, where we are going

Session Chair: Amélie Anglade Program Chair of f(MIR)

Meaningful Music Retrieval

Frans Wiering – [pdf]

Wiering-fmir.pdf (page 2 of 3)

Notes

  • Some unfortunate tendencies:  anatomical view of music – a dead body that we do autopsies, time is the loser  Traditional production-oriented/
  • Measure of similarity: relevance, surprise
  • Few interesting applications for end-users
  • bad fit to present-day musicological themes
  • We are in the world of ‘pure applied research’ – no truth interdisciplinary between music domain knowledge and computer science.
  • Music is meaningful (and the underlying personal motivation of most MIR researchers).
  • Meaning in musicology – traditionally a taboo suject
  • Subjectivity:  an indivds. disposition to engage in social and cultural interactions
  • Meaning generation process – we have a long-term memory for  music –
  • Can musical meaning provide the ‘big story line’ for MIR?

The Discipline Formerly Known As MIR

Perfecto Herrera, Joan Serrà, Cyril Laurier, Enric Guaus, Emilia Gómez and Xavier Serra

Intro: Our exploration is not a science-fiction essay. We do not try to imagine how music will be conceptualized, experienced and mediated by our yet-to-come research, technological    achievements  and  music gizmos. Alternatively, we reflect on how the discipline should evolve to become consolidated as such, in order it may get an effective future instead of becoming, after a promising start, just a “would-be” discipline.Our vision addresses different aspects: the discipline’s object of study, the employed methodologies, social and cultural impacts (which are out of this long abstract because of space restrictions), and we finish with some (maybe) disturbing issues that could be taken as partial and biased guidelines for future research.

Herrera-fmir.pdf (page 2 of 3)

Notes: One motivation for advancing MIR – more banquets!

  • MIR is no more about retrieval than computer science is about computers
  • Music Information Retrieval – it’s too narrow
  • Music Information or Information about Music?
  • Interested in the interaction with music information
  • We should be asking more profound questions
    • music
    • content tresasures in short musical exceprts, tracks performances etc.
    • context
  • music understanding systems
  • Most metadata will be generated in the creation / production phase (hmm.. don’t agree necessarily, all the good metadata (tags, who likes what) is based on context and use which is post-hoc)
  • Instead of automatic analysis – build systems to help humans help humans
  • Music like water? or Music as dog!!! – a friend – companion –
  • Personalization, Findability
  • Music turing test

Good, provocative talk

Oral Session 2: Potential future MIR applications

Session Chair: Jason Hockman (McGill University), Program Chair of f(MIR)

Machine Listening to Percussion: Current Approaches and Future Directions – [pdf]

Michael Ward

Abstract: approaches have been taken to detect and classify percussive events within music signals for a variety of purposes with differing and converging aims. In this paper an overview of those technologies is presented and a discussion of the issues still to overcome and future possibilities in the field are presented. Finally a system capable of monitoring a student drummer is envisaged which draws together current approaches and future work in the field.

Notes:

  • Challengs: Onset detection of isolated drum strokes
  • Onset detection and classification of overlapping drum sounds
  • Onset detection and classification in the presence of other instruments
  • Variability in Percussive sounds .  Dozens of criteria effect the sounds produced (strike velocity, angle, position etc.)
  • Future Research Areas
    • Extension of recognition to include the wide variety of strokes.  (open hh, half-open hh, hh foot splash etc)

MIR When All Recordings Are Gone: Recommending Live Music in Real-Time –  [pdf]

Marco Lüthy and Jean-Julien Aucouturier

Recommending live and short lived events. Bandsintown, Songkick, gigulate … pay attention to this paper.

Aucouturier-fmir.pdf (page 3 of 3)

Notes:

  • Recommendation for live music in real-time
  • Coldplay -> free album when you get a  ticket to a coldplay concert – give away the music
  • NIN ->  USB keys in the toilet – which had strange recording on the file – strange sounds – an FFT of the sounds showed phone number and GPS coordinates – turned into a treasure hunt to a NIN nails concert.
  • Komuso Tokugawa – an avatar for a musiciaon in second life.  Plays in second life, twitters concert announcements (playing wake for Les Paul in 3 minutes)
  • ‘How do we get there in time?’
  • JJ walked through how to  implement a recommender system in second life
  • Implicit preference inferred from how long your avatar listens to a concert (Nicole Yankelovich at Sun Labs should look at this stuff)
  • Great talk by JJ – full of energy – neat ideas. Good work.

 

Poster Session

  • Global Access to Ethnic Music: The Next Big Challenge?
    Olmo Cornelis, Dirk Moelants and Marc Leman
  • The Future of Music IR: How Do You Know When a Problem Is Solved?
    Eric Nichols and Donald Byrd

, , , , ,

Leave a comment

ISMIR 2009 – The Industry Panel

On Thursday I participated in the ISMIR industrial panel.  8 members of industry talked about the issues and challenges that they face in industry.  I had a good time on the panel, the panelists were all on target and very thoughtful, and there were great questions from the audience.  I’m happy too that the IRC channel offered a place for those to vent without the session turning into SXSW-style riot.

Justin Donaldson kept good notes on the panel and has posted them on his blog: ISMIR 2009 Industry Panel

, ,

Leave a comment

Taiko at the ISMIR 2009 Banquet

During the ISMIR Banquet (held in the most beautiful place in the world, the Kobe Kachoen) we were entertained by the Maturishu a Taiko performance group.  They were just fantastic:

Leave a comment

ISMIR Oral Session 7 – Harmonic & Melodic Similarity and Summarization

10:30-12:30 Oral Session (OS7) – Harmonic & Melodic Similarity and Summarization

Session Chair: Emilia Gómez (Universitat Pompeu Fabra, Spain)

Abstract: Content-based music retrieval requires to define a similarity measure between music documents. In this paper, we propose a novel similarity measure between melodic con- tent, as represented in symbolic notation, that takes into account musicological aspects on the structural function of the melodic elements. The approach is based on the representation of a collection of music scores with a graph structure, where terminal nodes directly describe the mu- sic content, internal nodes represent its incremental generalization, and arcs denote the relationships among them. The similarity between two melodies can be computed by analyzing the graph structure and finding the shortest path between the corresponding nodes inside the graph. Preliminary results in terms of music similarity are presented using a small test collection.
Notes:
  • high-level music dimensions are not reliably computed from audio
  • musicologists are more interested in scores
  • results with symbolic formats can be a reference for audio-based approaches
  • melodic similarity is not a solved problem
  • Overview of the approach:
ismir2009-proceedings.pdf (page 552 of 775)
(OS7-2) Modeling Harmonic Similarity Using a Generative Grammar of Tonal Harmony
by Bas de Haas, Martin Rohrmeier, Remco Veltkamp and Frans Wiering
Abstract: In this paper we investigate a new approach to the similarity of tonal harmony. We create a fully functional re- modeling of an earlier version of Rohrmeier’s grammar of harmony. With this grammar an automatic harmonic analysis of a sequence of symbolic chord labels is obtained in the form of a parse tree. The harmonic similarity is determined by finding and examining the largest labeled common embeddable subtree (LLCES) of two parse trees. For the calculation of the LLCES a new O(min(n, m)nm) time algorithm is presented, where n and m are the sizes of the trees. For the analysis of the LLCES we propose six distance measures that exploit several structural characteristics of the Combined LLCES. We demonstrate in a retrieval experiment that at least one of these new methods significantly outperforms a baseline string matching approach and thereby show that using additional musical knowledge from music cognitive and music theoretic mod- els actually helps improving retrieval performance.
Notes: Harmonic similarity based on chord sequence similarities.  Good for cover songs, plagiarism, improvised music.  Use a generative model of tonal harmony and compare parse trees.
  • Extract chord labels from audio and symbolic data (not the research focus)
  • Not all info is in the data.  Need a grammatical model of tonal harmony
ismir2009-proceedings.pdf (page 561 of 775)
Christopher Raphael
Abstract: A method for expressive melody synthesis is presented seeking to capture the structural and prosodic (stress, direction, and grouping) elements of musical interpretation. The interpretation of melody is represented through a hierarchical structural decomposition and a note-level prosodic annotation. An audio performance of the melody is constructed using the time-evolving frequency and intensity functions. A method is presented that transforms the expressive annotation into the frequency and intensity functions, thus giving the audio performance. In this framework, the problem of expressive rendering is cast as estimation of structural decomposition and the prosodic annotation. Examples are presented on a dataset of around 50 folk-like melodies, realized both from hand-marked and estimated annotations.
Notes: More interested in continuous instruments (theremin is the minimal possible instrument).  He’s trying to represent and estimate interpretation itself rather than mapping score into performance decisions.
Taxonomy of expression
  • Conveying musical structure (slowing down at boundaries for example)
  • Prosody ( stress, direction, grouping) – the heart of the matter
  • Musical Affect (happy, sad, etc) – not easy, so ignores this one
How to represent musical expression: passing tones, stress, receeding.

 

This was  a very good talk.

ismir2009-proceedings.pdf (page 565 of 775)
Abstract: This paper focuses on automatic extraction of acoustic chord sequences from a musical piece. Standard and factored language models are analyzed in terms of applicability to the chord recognition task. Pitch class profile vectors that represent harmonic information are extracted from the given audio signal. The resulting chord sequence is obtained by running a Viterbi decoder on trained hidden Markov models and subsequent lattice rescoring, applying the language model weight. We performed several experiments using the proposed technique. Results obtained on 175 manually-labeled songs provided an increase in accuracy of about 2%.
ismir2009-proceedings.pdf (page 573 of 775)
Abstract: Methods for spectral analysis of audio signals and their graphical display are widespread. However, assessing music and audio in the visual domain involves a number of challenges in the translation between auditory images into mental or symbolically represented concepts. This paper presents a spectral analysis method that exists entirely in the auditory domain, and results in an auditory presentation of a spectrum. It aims to strip a segment of audio sig- nal of its temporal content, resulting in a quasi-stationary signal that possesses a similar spectrum to the original signal. The method is extended and applied for the purpose of music summarisation.
Notes: Statistical Sonification, listen to spectra can be aided by audio steady state.  The time domain algorithm.  (TJ should do this at the Echo Nest!)
ismir2009-proceedings.pdf (page 577 of 775)
Abstract:The Cover Song Retrieval (CSR) problem has received considerable attention in the MIREX 2006-2008 evaluation sessions. While the reported performance figures provide a general idea about the strengths of the submitted systems, it is not clear what actually causes the reported performance of a certain system. In other words, the question arises whether some system component design choices are more critical for a system’s performance results than others. In order to obtain a better understanding of the performance of current CSR approaches and to give recommendations for future research in the field of CSR, we designed and performed a comparative study involving system component design approaches from the best-performing systems in MIREX 2006 and 2007. The datasets used for evaluation were carefully chosen to cover the broad spectrum of the cover song domain, while still providing designated test cases. While the choice of the dissimilarity assessment method was found to cause the largest CSR performance boost and very good retrieval results were obtained on classical opus retrieval cases, results obtained on a new test case, involving recordings originat- ing from different microphone sets, point out new challenges in optimizing the feature representation step.
A very thorough, interesting talk.
ismir2009-proceedings.pdf (page 585 of 775)

1 Comment

ISMIR Keynote – Wind instrument-playing humanoid robots

What’s not to love!?!? Robots and Music! This was a great talk.

Wind instrument-playing humanoid robots
Atsuo Takanishi

Some history of robots:

Wabot-2 – early music playing robot


Wabian-2 – walking robots

Emotional Robots

Kobian: Emotional humanoid robot

Voice Producing Robots

Music Performance Robots

(Compare)

,

2 Comments

Google’s new music search

YouTube - Google Music Search Feature

The news wires are abuzz with Google’s new music search feature.  The new Google feature will allow users to search for an artist, song, album or lyric and get a music result that will include album art and a ‘play’ button that will let you listen to the music.  MySpace and Lala will be serving up the music and you’ll be able to play any song in full just once.  The music results will also include links to Pandora, imeem and Rhapsody.  Lyrics search is provided by Gracenote.

Here’s the video announcement:

It’s about time that Google starts to include the ability to listen to search results – this will help. It’s pretty cool, but I don’t think it changes the music discovery game too much. Search is not discovery.

Update: The Register is particularly unimpressed: “Trying to forcefeed punters a lousy service is a bad idea, amplified by the assumption that if Facebook and Google are the feeding tube, we’ll suck it up.”

9 Comments

The SQL Join is destroying music

Brian Whitman,one of the founders of the Echo Nest, gave a provocative talk last week at Music and Bits.  Some excerpts:

Useless MIR Problems:

  • Genre Identification – “Countless PhDs on this useless task. Trying to teach a computer a marketing construct”

Hard but interesting MIR Problems:

  • Finding the saddest song in the world
  • Predicting Pitchfork and All Music Guide ratings
  • Predicting the gender of a listener based upon their music taste

On Recommendation:

  • “The best music experience is still very manual… I am still reading about music, not using a recommender.”
  • “If we only used collaborative filtering to discover music, the popular artists would eat the unknowns alive.”
  • “The SQL Join is destroying music”

Brian’s notes on the talk are on his blog.  The slides are online here. Highly recommended:

, ,

5 Comments

ISMIR Oral Session 6 – Similarity

Oral Session 6  – Similarity

Chair: Roger Dannenberg

ON RHYTHM AND GENERAL MUSIC SIMILARITY

Tim Pohle, Dominik Schnitzer, Markus Schedl, Peter Knees and Gerhard Widmer

Paper: pdf

Abstract: The contribution of this paper is threefold:
First, we propose modifications to Fluctuation Patterns [14]. The resulting descriptors are evaluated in the task of rhythm similarity computation on the “Ballroom Dancers” collection.Second, we show that by combining these rhythmic descriptors with a timbral component, results for rhythm similarity computation are improved beyond the level obtained when using the rhythm descriptor component alone.Third, we present one “unified” algorithm with fixed parameter set. This algorithm is evaluated on three different music collections. We conclude from these evaluations that the computed similarities reflect relevant aspects both of rhythm similarity and of general music similarity. The performance can be improved by tuning parameters of the “unified” algorithm to the specific task (rhythm similarity / general music similarity) and the specific collection, respectively.

Notes:

  • B&O recommender used OFAI
  • Nice results

ismir2009-proceedings.pdf (page 537 of 775)

GROUPING RECORDED MUSIC BY STRUCTURAL SIMILARITY

Juan Pablo Bello

Paper: PDF

Abstract: This paper introduces a method for the organization of recorded music according to structural similarity. It uses the Normalized Compression Distance (NCD) to measure the pairwise similarity between songs, represented using beat-synchronous self-similarity matrices. The approach is evaluated on its ability to cluster a collection into groups of performances of the same musical work. Tests are aimed at finding the combination of system parameters that improve clustering, and at highlighting the benefits and shortcomings of the proposed method. Results show that structural similarities can be well characterized by this approach, given consistency in beat tracking and overall song structure.

Notes:

  • Normalized Compression Distance (NCD) a universal distance metric.
  • Experimental setup – all classical music

ismir2009-proceedings.pdf (page 544 of 775)

A FILTER-AND-REFINE INDEXING METHOD FOR FAST SIMILARITY SEARCH IN MILLIONS OF MUSIC TRACKS

Dominik Schnitzer, Arthur Flexer, Gerhard Widmer

Paper: PDF

ABSTRACT We present a filter-and-refine method to speed up acous- tic audio similarity queries which use the Kullback-Leibler divergence as similarity measure. The proposed method rescales the divergence and uses a modified FastMap [1] implementation to accelerate nearest-neighbor queries. The search for similar music pieces is accelerated by a fac- tor of 10−30 compared to a linear scan but still offers high recall values (relative to a linear scan) of 95 − 99%.  We show how the proposed method can be used to query several million songs for their acoustic neighbors very fast while producing almost the same results that a linear scan over the whole database would return. We present a work- ing prototype implementation which is able to process sim- ilarity queries on a 2.5 million songs collection in about half a second on a standard CPU.

Notes: Gaussian similarity features can be expensive.

ismir2009-proceedings.pdf (page 549 of 775)

Leave a comment

ISMIR – MIREX Panel Discussion

Stephen Downie presents the MIREX session

Statistics for 2009:

  • 26 tasks
  • 138 participants
  • 289 evaluation runs

Results are now published: http://music-ir.org/r/09results

This year, new datasets:

  • Mazurkas
  • MIR 1K
  • Back Chorales
  • Chord and Segmentation datasets
  • Mood dataset
  • Tag-a-Tune

Evalutron 6K – Human evaluations – this year, 50 graders / 7500 possible grading events.

What’s Next?

Issues about MIREX

  • Rein in the parameter explosion
  • Not rigorously tested algorithms
  • Hard-coded parameters, path-separators, etc
  • Poorly specified data inputs/outputs
  • Dynamically linked libraries
  • Windows submissions
  • Pre-compiled Matlab/MEX Submissions
  • The ‘graduation’ problem – Andreas and Cameron will be gone in summer.

Long discussion with people opining about tests, data.  Ben Fields had a particularly good point about trying to make MIREX  better reflect real systems that draw upon web resources.

 

,

Leave a comment

ISMIR Oral Session 5 – Tags

Oral Session 5 – Tags

Session Chair: Paul Lamere

I’m the session chair for this session, so I can’t keep notes. So instead I offer the abstracts.

TAG INTEGRATED MULTI-LABEL MUSIC STYLE CLASSIFICATION WITH HYPERGRAPH

Fei Wang, Xin Wang, Bo Shao, Tao Li    Mitsunori Ogihara

Abstract: Automatic music style classification is an important, but challenging problem in music information retrieval. It has a number of applications, such as indexing of and search- ing in musical databases. Traditional music style classifi- cation approaches usually assume that each piece of music has a unique style and they make use of the music con- tents to construct a classifier for classifying each piece into its unique style. However, in reality, a piece may match more than one, even several different styles. Also, in this modern Web 2.0 era, it is easy to get a hold of additional, indirect information (e.g., music tags) about music. This paper proposes a multi-label music style classification ap- proach, called Hypergraph integrated Support Vector Ma- chine (HiSVM), which can integrate both music contents and music tags for automatic music style classification. Experimental results based on a real world data set are pre- sented to demonstrate the effectiveness of the method.

ismir2009-proceedings.pdf (page 372 of 775)

EASY AS CBA: A SIMPLE PROBABILISTIC MODEL FOR TAGGING MUSIC

Matthew D. Hoffman, David M. Blei, Perry R. Cook

ABSTRACT Many songs in large music databases are not labeled with semantic tags that could help users sort out the songs they want to listen to from those they do not. If the words that apply to a song can be predicted from audio, then those predictions can be used both to automatically annotate a song with tags, allowing users to get a sense of what qualities characterize a song at a glance. Automatic tag prediction can also drive retrieval by allowing users to search for the songs most strongly characterized by a particular word. We present a probabilistic model that learns to predict the probability that a word applies to a song from audio. Our model is simple to implement, fast to train, predicts tags for new songs quickly, and achieves state-of-the-art performance on annotation and retrieval tasks.

ismir2009-proceedings.pdf (page 381 of 775)

USING ARTIST SIMILARITY TO PROPAGATE SEMANTIC INFORMATION

Joon Hee Kim, Brian Tomasik, Douglas Turnbull

ABSTRACT Tags are useful text-based labels that encode semantic information about music (instrumentation, genres, emotions, geographic origins). While there are a number of ways to collect and generate tags, there is generally a data sparsity problem in which very few songs and artists have been accurately annotated with a sufficiently large set of relevant tags. We explore the idea of tag propagation to help alleviate the data sparsity problem. Tag propagation, originally proposed by Sordo et al., involves annotating a novel artist with tags that have been frequently associated with other similar artists. In this paper, we explore four approaches for computing artists similarity based on dif- ferent sources of music information (user preference data, social tags, web documents, and audio content). We com- pare these approaches in terms of their ability to accurately propagate three different types of tags (genres, acoustic de- scriptors, social tags). We find that the approach based on collaborative filtering performs best. This is somewhat surprising considering that it is the only approach that is not explicitly based on notions of semantic similarity. We also find that tag propagation based on content-based mu- sic analysis results in relatively poor performance.

ismir2009-proceedings.pdf (page 387 of 775)MUSIC MOOD REPRESENTATIONS FROM SOCIAL TAGS

Cyril Laurier, Mohamed Sordo, Joan Serra, Perfecto Herrera

ABSTRACT This paper presents findings about mood representations. We aim to analyze how do people tag music by mood, to create representations based on this data and to study the agreement between experts and a large community. For this purpose, we create a semantic mood space from last.fm tags using Latent Semantic Analysis. With an unsuper- vised clustering approach, we derive from this space an ideal categorical representation. We compare our commu- nity based semantic space with expert representations from Hevner and the clusters from the MIREX Audio Mood Classification task. Using dimensional reduction with a Self-Organizing Map, we obtain a 2D representation that we compare with the dimensional model from Russell. We present as well a tree diagram of the mood tags obtained with a hierarchical clustering approach. All these results show a consistency between the community and the ex- perts as well as some limitations of current expert models. This study demonstrates a particular relevancy of the basic emotions model with four mood clusters that can be sum- marized as: happy, sad, angry and tender. This outcome can help to create better ground truth and to provide more realistic mood classification algorithms. Furthermore, this method can be applied to other types of representations to build better computational models.

EVALUATION OF ALGORITHMS USING GAMES: THE CASE OF MUSIC TAGGING

Edith Law, Kris West, Michael Mandel, Mert Bay, J. Stephen Downie

Abstract Search by keyword is an extremely popular method for retrieving music. To support this, novel algorithms that automatically tag music are being developed. The conventional way to evaluate audio tagging algorithms is to com- pute measures of agreement between the output and the ground truth set. In this work, we introduce a new method for evaluating audio tagging algorithms on a large scale by collecting set-level judgments from players of a human computation game called TagATune. We present the de- sign and preliminary results of an experiment comparing five algorithms using this new evaluation metric, and con- trast the results with those obtained by applying several conventional agreement-based evaluation metrics.

ismir2009-proceedings.pdf (page 400 of 775)

,

Leave a comment