ABSTRACT – The automated extraction of tempo and beat information from music recordings is a challenging task. Especially in the case of expressive performances, current beat tracking approaches still have significant problems to accurately capture local tempo deviations and beat positions. In this paper, we introduce a novel evaluation framework for detecting critical passages in a piece of music that are prone to tracking errors. Our idea is to look for consistencies in the beat tracking results over multiple performances of the same underlying piece. As another contribution, we further classify the critical passages by specifying musical properties of certain beats that frequently evoke trac ing errors. Finally, considering three conceptually different beat tracking procedures, we conduct a case study on the basis of a challenging test set that consists of a variety of piano performances of Chopin Mazurkas. Our experimental results not only make the limitations of state-of-the-art beat trackers explicit but also deepens the understanding of the underlying music material.
The Audio processing Library for Flash affords music-IR researchers the opportunity to generate rich, interactive, real-time music-IR driven applications. The various lev-els of complexity and control as well as the capability to execute analysis and synthesis simultaneously provide a means to generate unique programs that integrate content based retrieval of audio features. We have demonstrated the versatility and usefulness of ALF through the variety of applications described in this paper. As interest in mu sic driven applications intensifies, it is our goal to enable the community of developers and researchers in music-IR and related fields to generate interactive web-based media.
ABSTRACT – Music21 is an object-oriented toolkit for analyzing, searching, and transforming music in symbolic (score- based) forms. The modular approach of the project allows musicians and researchers to write simple scripts rapidly and reuse them in other projects. The toolkit aims to pro- vide powerful software tools integrated with sophisticated musical knowledge to both musicians with little pro- gramming experience (especially musicologists) and to programmers with only modest music theory skills.
Music21 looks to be a pretty neat toolkit for analyzing and manipulating symbolic music. It’s like Echo Nest Remix for MIDI. The blog has lots more info: music21 blog. You can get the toolkit here: music21
ABSTRACT - Humans tend to organize perceived information into hierarchies and structures, a principle that also applies to music. Even musically untrained listeners unconsciously analyze and segment music with regard to various musical aspects, for example, identifying recurrent themes or detecting temporal boundaries between contrasting musical parts. This paper gives an overview of state-of-the- art methods for computational music structure analysis, where the general goal is to divide an audio recording into temporal segments corresponding to musical parts and to group these segments into musically meaningful categories. There are many different criteria for segmenting and structuring music audio. In particular, one can identify three conceptually different approaches, which we refer to as repetition-based, novelty-based, and homogeneity- based approaches. Furthermore, one has to account for different musical dimensions such as melody, harmony, rhythm, and timbre. In our state-of-the-art report, we address these different issues in the context of music structure analysis, while discussing and categorizing the most relevant and recent articles in this field.
This presentation is an overview of the music structure analysis problem, and the methods proposed for solving it. The methods have been divided into three categories: novelty-based approaches, homogeneity-based approaches, and repetition-based approaches. The comparison of different methods has been problematic because of the differring goals, but current evaluations suggest that none of the approaches is clearly superior at this time, and that there is still room for considerable improvements.
Notes from the ISMIR business meeting – this is a meeting with the board of ISMIR.
Officers
President: J. Stephen Downie, University of Illinois at Urbana-Champaign, USA
Treasurer: George Tzanetakis, University of Victoria, Canada
Secretary: Jin Ha Lee, University of Illinois at Urbana-Champaign, USA
President-elect: Tim Crawford, Goldsmiths College, University of London, UK
Member-at-large: Doug Eck, University of Montreal, Canada
Member-at-large: Masataka Goto, National Institute of Advanced Industrial Science and Technology, Japan
Member-at-large: Meinard Mueller, Max-Planck-Institut für Informatik, Germany
Stephen reviewed the roles of the various officers and duties of the various committees. He reminded us that one does not need to be on the board to serve on a subcommittee.
Publication Issues
website redesign
Other communities hardly know about ISMIR. Want to help other communities be aware of our research. One way is to make more links to other communities. Entering committees in other communities.
Hosting Issue – will formalize documentation, location planning, site selection.
Name change? There was a nifty debate around the meaning of ISMIR. There was a proposal to change it to ‘International Society for Music Informatics Research’. I recommend, given Doug’s comments about Youtube from this morning that we change the name to: ‘ International Society for Movie Informatics Research’
Review Process: Good discussion about the review process – we want paper bidding and double-blind reviews. Helps avoid gender bias:
Doug snuck in the secret word ‘youtube’ too, just for those hanging out on IRC.
musicmetric tracks 3 areas: Social networks, network analysis (influential fans), text via focused crawlers, p2p networks
memix – music recommendation, artist radio, artist similarity, playlists. Pandora-like human analysis on 150K songs – then they learn these tags with machine learning. Look at which features best predict the tags. Important question is ‘what is important for the listeners’. Their aim is to find best parameters for taste prediction.
google – goal is organize the world’s information. Doug would like to see an open API for companies to collaborate
Rebecca is the moderator.
What do you think is the next big thing? How is tech going to change things in the near future?
Doug (Google) thinks that ‘music recommendation is solved’ – he’s excited about the cellphone. Also excited about programs like chuck to make it easier for people to create music (nice pandering to the moderator, doug!)
Ricardo (MeeMix) – the laid back position is the future – reach the specific taste of a user. Personalized advertisements.
Greg (MusicMetric) – Cloudbased services will help us understand what people want which will yield to playlisting, recommendation, novel players.
Martin (RjDJ) – Thinks that the phone is really exciting – having all this power in the phone lets you do neat thing. He’s excited about how people will be able to create music – using sensory inputs, ambient audio.
How will tech revolutionize music?
Doug – being able to collaborate with Arcade Fire on online
Martin – musically illiterate should be able to make music
Ricardo – we can help new artists reach the right fans
Greg – services for helping artists, merchandising, ticket sales etc.
What are the most interesting problems or technical questions?
Greg – interested in understanding the behavior of the fans. Especially by those on P2P networks. Huge amount of geographic-specific listener data
Ricardo – more research around taste and recommendation
Doug – a rant – he had a paper rejected because the paper had something to do with music generation.
Rebecca – has a MIR for music google group :MIR4Music
Martin – engineering:increase performance in portable devices – research:how to extract music features from music cheaply
Ricardo – drumming style is hard to extract – but actually not that important for taste prediction
How would you characterize the relationship between biz and academia
Greg – there is lots of ’advanced research’ in academia, while in industry there look at much more applied problems
Doug – suggests that the leader of an academic lab is key to bridging the gap between biz and academia. Grad students should be active in looking for the internships in industry to get a better understanding of what is needed in industry. It is all about getting grad students jobs in industry.
Audience Q/A
what tools can we create to help producers of music? – Answer: Youtube. Martin talks about understanding how people use music creation tools. Doug: “Don’t build things that people don’t want.” - to do this you need to try this on real data.
Hmmm … only one audience q/a. sigh …
Good panel, lots of interesting ideas. Here is the future of music:
MIR at Google: Strategies for Scaling to Large Music Datasets Using Ranking and Auditory Sparse-Code Representations Douglas Eck (Google) (Invited speaker) - There’s no paper associated with this talk.
Machine Listening / Audio analysis – Dick Lyon and Samy Bengio
Main strength:
Scalable algorithms
When they do work, they use large sets (like all audio on Youtube, or all audio on the web)
ABSTRACT – Most MIR systems are specifically designed for one appli- cation and one cultural context and suffer from the seman- tic gap between the data and the application. Advances in the theory of Bayesian language and information process- ing enable the vision of a versatile, meaningful and accu- rate MIR system integrating all levels of information. We propose a roadmap to collectively achieve this vision.
Wants to increase versatility of MIR systems across different types of music. Systems adopt a fixed expert viewpoint ( musicologist, musician). Have limited accuracy due to general pattern recognition techniques applied to a bag of features.
Emannuel wants to build an overarching scalable MIR system that successfully deals with the challenge on scalable unsupervised methods and refocuses MIR on symbolic methods. This is the core roadmap of VERSAMUS.
The aim of VERSAMUS is to investigate, design and validate such representations in the framework of Bayesian data analysis, which provides a rigorous way of combining separate feature models in a modular fashion. Tasks to be addressed include the design of a versatile model structure, of a library of feature models and of efficient algorithms for parameter inference and model selection. Efforts will also be dedicated towards the development of a shared modular software platform and a shared corpus of multi-feature annotated music which will be reusable by both partners in the future and eventually disseminated
ABSTRACT - The hypothesis of the paper is that the domain of Nat- ural Languages Processing (NLP) resembles current re- search in music so one could benefit from this by employ- ing NLP techniques to music. In this paper the similarity between both domains is described. The levels of NLP are listed with pointers to respective tasks within the research of computational music. A brief introduction to history of NLP enables locating music research in this history. Pos- sible directions of research in music, assuming its affinity to NLP, are introduced. Current research in generational and statistical music modeling is compared to similar NLP theories. The paper is concluded with guidelines for music research and information retrieval.
Notes: The speaker points out the similarities and differences between NLP and MIR.
Some differences:
Most people are illiterates (i.e. can’t read/write music)
Much more complex representation
Limited space of all possible pieces (not sure I agree, the argument is that anyone can generate text/speech, but not so much for music)
History of NLP
Grammars, Chomsky, Turing Test
Period of optimism: automatic translation – but failed
Data mining and statistical methods. Large corpora, brown, wordnet
Semantics defined by statistics
Algorithms vs. Data: Algorithms don’t matter much, it is all about the data. More data is better.
Comparing Music Objects: similar to the Text Translation problem
ABSTRACT - Dynamic Time Warping (DTW) is used to find alignments between two related streams of information and can be used to link data, recognise patterns or find similarities. Typically, DTW requires the complete series of both input streams in advance and has quadratic time and space requirements. As such DTW is unsuitable for real-time applications and is inefficient for aligning long sequences. We present Windowed Time Warping (WTW), a variation on DTW that, by dividing the path into a series of DTW windows and making use of path cost estimation, achieves alignments with an accuracy and efficiency superior to other leading modifications and with the capability of synchronising in real-time. We demonstrate this method in a score following application. Evaluation of the WTW score following system found 97.0% of audio note onsets were correctly aligned within 2000 ms of the known time. Results also show reductions in execution times over state-of-the- art efficient DTW modifications.
Idea: Frame window features – (sub dtw frames). Each path can be calculated sequentially, so less history needs to be retained which is important for performance.
Works in linear time like previous systems, but with the smaller history it can work entirely in memory, so it avoids the problem of needing to store the history on disk. Nice demo of a real-time time warping.