Archive for category ismir

f(MIR) industrial panel

  • Douglas Eck (Google)
  • Greg Mead (Musicmetric)
  • Martin Roth (RjDj)
  • Ricardo Tarrasch (Meemix)
  • moderator: Rebecca Fiebrink (Princeton)

  • rjdj – music making apps on devices like iphones
  • musicmetric tracks 3 areas: Social networks, network analysis (influential fans), text via focused crawlers, p2p networks
  • memix – music recommendation, artist radio, artist similarity, playlists.  Pandora-like human analysis on 150K songs – then they learn these tags with machine learning.  Look at which features best predict the tags.  Important question is ‘what is important for the listeners’.  Their aim is to find best parameters for taste prediction.
  • google – goal is organize the world’s information.   Doug would like to see an open API for companies to collaborate

Rebecca is the moderator.

What do you think is the next big thing? How is tech going to change things in the near future?

  • Doug (Google) thinks that ‘music recommendation is solved’ – he’s excited about the cellphone.  Also excited about programs like chuck to make it easier for people to create music (nice pandering to the moderator, doug!)
  • Ricardo  (MeeMix) – the laid back position is the future – reach the specific taste of a user.  Personalized advertisements.
  • Greg (MusicMetric) – Cloudbased services will help us understand what people want which will yield to playlisting, recommendation, novel players.
  • Martin (RjDJ) – Thinks that the phone is really exciting – having all this power in the phone lets you do neat thing.  He’s excited about how people will be able to create music – using sensory inputs, ambient audio.

How will tech revolutionize music?

  • Doug – being able to collaborate with Arcade Fire on online
  • Martin – musically illiterate should be able to make music
  • Ricardo – we can help new artists reach the right fans
  • Greg – services for helping artists, merchandising, ticket sales etc.

What are the most interesting problems or technical questions?

  • Greg – interested in understanding the behavior of the fans. Especially by those on P2P networks. Huge amount of geographic-specific listener data
  • Ricardo – more research around taste and recommendation
  • Doug – a rant – he had a paper rejected because the paper had something to do with music generation.
  • Rebecca – has a MIR for music google group :MIR4Music
  • Martin – engineering:increase performance in portable devices – research:how to extract music features from music cheaply
  • Ricardo – drumming style is hard to extract – but actually not that important for taste prediction

How would you characterize the relationship between biz and academia

  • Greg – there is lots of  ‘advanced research’ in academia, while in industry  there look at much more applied problems
  • Doug – suggests that the leader of an academic lab is key to bridging the gap between biz and academia.  Grad students should be active in looking for the internships in industry to get a better understanding of what is needed in industry.  It is all about getting grad students jobs in industry.

Audience Q/A

  • what tools can we create to help producers of music? – Answer: Youtube. Martin talks about understanding how people use music creation tools.   Doug: “Don’t build things that people don’t want.”  – to do this you need to try this on real data.

Hmmm … only one audience q/a.  sigh …

Good panel, lots of interesting ideas.  Here is the future of music:

2 Comments

MIR at Google: Strategies for Scaling to Large Music Datasets Using Ranking and Auditory Sparse-Code Representations

MIR at Google: Strategies for Scaling to Large Music Datasets Using Ranking and Auditory Sparse-Code Representations
Douglas Eck (Google) (Invited speaker) – There’s no paper associated with this talk.

Machine Listening / Audio analysis – Dick Lyon and Samy Bengio

Main strength:

  • Scalable algorithms
    • When they do work, they use large sets (like all audio on Youtube, or all audio on the web)
  • Sparse High dimensional Representations
    • 15 numbers to describe a track
  • Auditory / Cohchlear Modeling
  • Autotagging at Youtube
  • Retrieval, annotation, ranking, recommendation

Collaboration Opportunities

  • Faculty research awards
  • Google visiting faculty program
  • Student internships
  • Google summer of code
  • Research Infrastructure

The Future of MIR is already here

  • Next generation of listeners are using Youtube – because of the on-demand nature
  • Youtube – 2 billion views a day
  • Content ID scans over 100 years of video every day

The Bar is already set very high ..

  • Current online recommendation is pretty good
  • Doug wants to close the loop between music making and music listening

What would you like Google to give back to MIR?

1 Comment

A Roadmap Towards Versatile MIR

A Roadmap Towards Versatile MIR
Emmanuel Vincent, Stanislaw A. Raczyński, Nobutaka Ono and Shigeki Sagayama

ABSTRACT – Most MIR systems are specifically designed for one appli- cation and one cultural context and suffer from the seman- tic gap between the data and the application. Advances in the theory of Bayesian language and information process- ing enable the vision of a versatile, meaningful and accu- rate MIR system integrating all levels of information. We propose a roadmap to collectively achieve this vision.

Wants to increase versatility of MIR systems across different types of music.  Systems adopt a fixed expert viewpoint ( musicologist, musician).  Have limited accuracy due to general pattern recognition techniques applied to a bag of features.

Emannuel wants to build an overarching scalable MIR system that successfully deals with the challenge on scalable unsupervised methods and refocuses MIR on symbolic methods.   This is the core roadmap of VERSAMUS.

The aim of VERSAMUS is to investigate, design and validate such representations in the framework of Bayesian data analysis, which provides a rigorous way of combining separate feature models in a modular fashion. Tasks to be addressed include the design of a versatile model structure, of a library of feature models and of efficient algorithms for parameter inference and model selection. Efforts will also be dedicated towards the development of a shared modular software platform and a shared corpus of multi-feature annotated music which will be reusable by both partners in the future and eventually disseminated

Leave a comment

Predicting Development of Research in Music Based on Parallels with Natural Language Processing

It is the f(MIR) workshop – The Future of MIR – What will MIR be like in 5 or 20 years?

This is the f(MIR) session.  Always a highlight at ISMIR


Predicting Development of Research in Music Based on Parallels with Natural Language Processing
Jacek Wołkowicz and Vlado Kešelj

ABSTRACT – The hypothesis of the paper is that the domain of Nat- ural Languages Processing (NLP) resembles current re- search in music so one could benefit from this by employ- ing NLP techniques to music. In this paper the similarity between both domains is described. The levels of NLP are listed with pointers to respective tasks within the research of computational music. A brief introduction to history of NLP enables locating music research in this history. Pos- sible directions of research in music, assuming its affinity to NLP, are introduced. Current research in generational and statistical music modeling is compared to similar NLP theories. The paper is concluded with guidelines for music research and information retrieval.

Notes: The speaker points out the similarities and differences between NLP and MIR.

Some differences:

  • Most people are illiterates (i.e. can’t read/write music)
  • Much more complex representation
  • Limited space of all possible pieces (not sure I agree, the argument is that anyone can generate text/speech, but not so much for music)

History of NLP

  • Grammars, Chomsky, Turing Test
  • Period of optimism: automatic translation – but failed
  • Data mining and statistical methods. Large corpora, brown, wordnet
  • Semantics defined by statistics

Algorithms vs. Data:  Algorithms don’t matter much, it is all about the data. More data is better.

Comparing Music Objects: similar to the Text Translation problem

What needs to be done:

  • Web crawling companies need to give MIR more data
  • Convince publishers to annotate data
  • Collect parallel data (MIDI / audio)

Leave a comment

Accurate Real-time Windowed Time Warping

Accurate Real-time Windowed Time Warping
Robert Macrae and Simon Dixon

ABSTRACT – Dynamic Time Warping (DTW) is used to find alignments between two related streams of information and can be used to link data, recognise patterns or find similarities. Typically, DTW requires the complete series of both input streams in advance and has quadratic time and space requirements. As such DTW is unsuitable for real-time applications and is inefficient for aligning long sequences. We present Windowed Time Warping (WTW), a variation on DTW that, by dividing the path into a series of DTW windows and making use of path cost estimation, achieves alignments with an accuracy and efficiency superior to other leading modifications and with the capability of synchronising in real-time. We demonstrate this method in a score following application. Evaluation of the WTW score following system found 97.0% of audio note onsets were correctly aligned within 2000 ms of the known time. Results also show reductions in execution times over state-of-the- art efficient DTW modifications.

Idea: Frame window features – (sub dtw frames).  Each path can be calculated sequentially, so less history needs to be retained which is important for performance.

Works in linear time like previous systems, but with the smaller history it can work entirely in memory, so it avoids the problem of needing to store the history on disk. Nice demo of a real-time time warping.

Leave a comment

A Multi-pass Algorithm for Accurate Audio-to-Score Alignment

A Multi-pass Algorithm for Accurate Audio-to-Score Alignment
Bernhard Niedermayer and Gerhard Widmer

ABSTRACT – Most current audio-to-score alignment algorithms work on the level of score time frames; i.e., they cannot differentiate between several notes occurring at the same discrete time within the score. This level of accuracy is sufficient for a variety of applications. However, for those that deal with, for example, musical expression analysis such micro timings might also be of interest. Therefore, we propose a method that estimates the onset times of individual notes in a post-processing step. Based on the initial alignment and a feature obtained by matrix factorization, those notes for which the confidence in the alignment is high are chosen as anchor notes. The remaining notes in between are revised, taking into account the additional information about these anchors and the temporal relations given by the score. We show that this method clearly outperforms a reference method that uses the same features but does not differenti- ate between anchor and non-anchor notes.

The main contribution is the introduction of an expectation strength function modeling the expected onset time of a note between two anchors. Although results are encouraging, there are specific circumstances where the algorithm fails, i.e., temporal displacement of notes is large.

Leave a comment

Understanding Features and Distance Functions for Music Sequence Alignment

Understanding Features and Distance Functions for Music Sequence AlignmentOzgur Izmirli and Roger Dannenberg

ABSTRACT We investigate the problem of matching symbolic representations directly to audio based representations for applications that use data from both domains. One such application is score alignment, which aligns a sequence of frames based on features such as chroma vectors and distance functions such as Euclidean distance. Good representations are critical, yet current systems use ad hoc constructions such as the chromagram that have been shown to work quite well. We investigate ways to learn chromagram-like representations that optimize the classification of “matching” vs. “non-matching” frame pairs of audio and MIDI. New representations learned automatically from examples not only perform better than the chromagram representation but they also reveal interesting projection structures that differ distinctly from the traditional chromagram.

Roger and  Ozgur present a method for learning features for score alignment.  They bypass the traditional chromagram feature with a feature that is learned projection of the audio spectrum.  Results show that the new features work better than chroma.

Leave a comment