Paul

Unknown's avatar

I'm the Director of Developer Community at The Echo Nest, a research-focused music intelligence startup that provides music information services to developers and partners through a data mining and machine listening platform. I am especially interested in hybrid music recommenders and using visualizations to aid music discovery.

MIREX 2010

Dr. Downie gave a summary of MIREX 2010 – the evaluation track for MIR (it is like TREC for MIR).  Results are here:  MIREX-2010 results

Matthias Mauch makes a case for improving chord recognition by making the MIREX tasks harder. He’d like to gently increase the difficulty.

Leave a comment

Improving the Generation of Ground Truths Based on Partially Ordered Lists

Improving the Generation of Ground Truths Based on Partially Ordered Lists
Julián Urbano, Mónica Marrero, Diego Martín and Juan Lloréns

abstract: Ground truths based on partially ordered lists have been used for some years now to evaluate the effectiveness of Music Information Retrieval systems, especially in tasks related to symbolic melodic similarity. However, there has been practically no meta-evaluation to measure or improve the correctness of these evaluations. In this paper we revise the methodology used to generate these ground truths and disclose some issues that need to be addressed. In particular, we focus on the arrangement and aggrega- tion of the relevant results, and show that it is not possi- ble to ensure lists completely consistent. We develop a measure of consistency based on Average Dynamic Re- call and propose several alternatives to arrange the lists, all of which prove to be more consistent than the original method. The results of the MIREX 2005 evaluation are revisited using these alternative ground truths.

Current approach of Partially Ordered Lists for evaluating tasks like melody similarity may not be the best way to evaluate.

  • They are expensive
  • They have some odd results
  • They are hard to replicate
  • leave out relevant results
  • Inconsistencies among the expert evaluations are not treated properly

Authors propose an alternate aggregation:

  • All: a new group is started if the pivot incipit is significantly different from every incipit in the current group. This should lead to larger groups.
  • Any: a new group is started if the pivot incipit is significantly different from any incipit in the current group. This should lead to smaller groups.
  • Prev: a new group is started if the pivot incipit is significantly different from the previous one.

Authors applied the new ranking system to MIREX 2005 – resulting in lowered performance and modifying ranking of several systems.

Leave a comment

A Cartesian Ensemble of Feature Subspace Classifiers for Music Categorization

A Cartesian Ensemble of Feature Subspace Classifiers for Music Categorization (pdf)

Thomas Lidy, Rudolf Mayer, Andreas Rauber, Pedro J. Ponce de León, Antonio Pertusa, and Jose Manuel Iñesta

Abstract: We present a cartesian ensemble classification system that is based on the principle of late fusion and feature sub- spaces. These feature subspaces describe different aspects of the same data set. The framework is built on the Weka machine learning toolkit and able to combine arbitrary fea- ture sets and learning schemes. In our scenario, we use it for the ensemble classification of multiple feature sets from the audio and symbolic domains. We present an extensive set of experiments in the context of music genre classifi- cation, based on numerous Music IR benchmark datasets, and evaluate a set of combination/voting rules. The results show that the approach is superior to the best choice of a single algorithm on a single feature set. Moreover, it also releases the user from making this choice explicitly.

An ensemble classification system built on top of Weka:

Results, using different datasets, classifiers and feature sets:

Execution times were about 10 seconds per song, so rather slow for large collections.

The ensemble approach delivered superior results through adding a reasonable amount of feature sets and classifiers.  However, they did not discover a combination rule that always outperforms all the others.

Leave a comment

ON THE APPLICABILITY OF PEER-TO-PEER DATA IN MUSIC INFORMATION RETRIEVAL RESEARCH

ON THE APPLICABILITY OF PEER-TO-PEER DATA IN MUSIC INFORMATION RETRIEVAL RESEARCH (pdf)
Noam Koenigstein, Yuval Shavitt, Ela Weinsberg, and Udi Weinsberg

abstract:Peer-to-Peer (p2p) networks are being increasingly adopted as an invaluable resource for various music information re- trieval (MIR) tasks, including music similarity, recommen- dation and trend prediction. However, these networks are usually extremely large and noisy, which raises doubts re- garding the ability to actually extract sufficiently accurate information.

This paper evaluates the applicability of using data orig- inating from p2p networks for MIR research, focusing on partial crawling, inherent noise and localization of songs and search queries. These aspects are quantified using songs collected from the Gnutella p2p network. We show that the power-law nature of the network makes it relatively easy to capture an accurate view of the main-streams using relatively little effort. However, some applications, like trend prediction, mandate collection of the data from the “long tail”, hence a much more exhaustive crawl is needed. Furthermore, we present techniques for overcoming noise originating from user generated content and for filtering non informative data, while minimizing information loss

Observation – CF systems tend to outperform content-based systems until you get in the long tail – so to improved CF systems, you need more long tail data.  This work explores how to get more long tail data by mining p2p networks.

P2P systems have some problems – privacy concerns, data collection is hard. High user churn, very noisy data, some users delete content from shared folders right away, sparsity

P2P mining Shared folders are useful for similarity, search queries are useful for trends.

Lots of p2p challenges and steps – getting IP addresses for p2p nodes, filtering out non-musical content, geo-identification, anonymization.

Dealing with sparsity:  1.2 million users, but average of 1 artist/song data point for each artist/song relation.  These graphs show song popularity in shared folders. They use this data to help filter out non-typical users.

Identifying songs: Use the hash file – but of course many songs have many different digital copies – so they also look at the (noisy) metadata.

Songs Discovery Rate

Once you reach about 1/3 of the network you’ve found most of the tracks if you use metadata for resolving.  If you use the hashes, you need to crawl 70% of the network.

Using shared folders for similarity

There’s a preferential attachment model for popular  songs

Conclusion: P2P data is good source of long tail data, but dealing with the noisy data is hard.  The p2p data is especially good for building similarity models localized to countries. A good talk with from someone with lots of experience with p2p stuff.

Leave a comment

LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY

LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY

Alexandros Nanopoulos

Ioannis Karydis, Milosˇ Radovanovic, Mirjana Ivanovic

Abstract: Spectral similarity measures have been shown to exhibit good performance in several Music Information Retrieval (MIR) applications. They are also known, however, to pos- sess several undesirable properties, namely allowing the existence of hub songs (songs which frequently appear in nearest neighbor lists of other songs), “orphans” (songs which practically never appear), and difficulties in distin- guishing the farthest from the nearest neighbor due to the concentration effect caused by high dimensionality of data space. In this paper we develop a conceptual framework that allows connecting all three undesired properties. We show that hubs and “orphans” are expected to appear in high-dimensional data spaces, and relate the cause of their appearance with the concentration property of distance / similarity measures. We verify our conclusions on real mu- sic data, examining groups of frames generated by Gaus- sian Mixture Models (GMMs), considering two similar- ity measures: Earth Mover’s Distance (EMD) in combi- nation with Kullback-Leibler (KL) divergence, and Monte Carlo (MC) sampling. The proposed framework can be useful to MIR researchers to address problems of spectral similarity, understand their fundamental origins, and thus be able to develop more robust methods for their remedy.

Problem is mainly due to the high-dimensional vector space – so problems like hubs, orphans are expected. So, lets look at how to deal with this problem of high-dimensionality.

One problem, in Euclidean space, as we get into higher dimensions it harder to distinguish between the farthest and the nearest neighbor in high dimensions.

This is a natural result of high dimensionality and  leads to the problem of hubs and orphans.

Another way of looking at this is to show the ratio between the standard deviation andof the neighbor distances as a function of dimensionality:

Conclusion – high dimensionality is responsible for problems of hubs, orphans and the concentration effect.

This was an interesting talk and has lots of potential impact on spectral similarity.

Leave a comment

MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW

MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEWYoungmoo E. Kim, Erik M. Schmidt, Raymond Migneco, Brandon G. Morton Patrick Richardson, Jeffrey Scott, Jacquelin A. Speck, and Douglas Turnbull (pdf)

Youngmoo presents on mood

From the paper: Recognizing musical mood remains a challenging problem primarily due to the inherent ambiguities of human emotions. Though research on this topic is not as mature as some other Music-IR tasks, it is clear that rapid progress is being made. In the past 5 years, the performance of automated systems for music emotion recognition using a wide range of annotated and content-based features (and multi-modal feature combinations) have advanced significantly. As with many Music-IR tasks open problems remain at all levels, from emotional representations and annotation methods to feature selection and machine learning.

While significant advances have been made, the most accurate systems thus far achieve predictions through large-scale machine learning algorithms operating on vast feature sets, sometimes spanning multiple domains, applied to relatively short musical selections. Oftentimes, this approach reveals little in terms of the underlying forces driving the perception of musical emotion (e.g., varying contributions of features) and, in particular, how emotions in music change over time. In the future, we anticipate further collaborations between Music-IR researchers, psychologists, and neuroscientists, which may lead to a greater understanding of not only mood within music, but human emotions in general. Furthermore, it is clear that individu- als perceive emotions within music differently. Given the multiple existing approaches for modeling the ambiguities of musical mood, a truly personalized system would likely need to incorporate some level of individual profiling to adjust its predictions.

This paper has provided a broad survey of the state of the art, highlighting many promising directions for further research. As attention to this problem increases, it is our hope that the progress of this research will continue to accelerate in the near future.

My notes:

Mirex performance on mood classification has held steady for the last few years.  Most mood classification systems in Mirex are just adapted genre classifiers.

Categorical vs. dimensional

Categorical: Mirex classifies mood into 5 clusters:

Dimensional:  – the ever popular Valence-Arousal space – sometimes called the Thayer Mood model:


Typical emotion classification system

Ground Truth

A big challenge is to come up with groundtruth for training a recognition system. Last.fm tags,  GWAP,  AMG labels, web documents are common sources.

Lyrics – using lyrics alone has not been too successful for  mood classification.

Content-based methods – typical features for mood:

Youngmoo’s latest work (with Eric Schmidt) is showing the distribution and change of emotion over time.

Hybrid systems

  • Audio + Lyrics – some to high improvement
  • Audio + Tags = good improvement
  • Audio + Images = using album art to derive associations to mood

Conclusions – Mood recognition hasn’t improved much in recent years – probably because most systems are not really designed specifically for mood.

This was a great overview of the state-of-the-art. I’d be interested in hearing a much longer version of this talk.  The paper and the references will be a great resource for anyone who’s interested in pursuing mood classification.

Leave a comment

Solving Misheard Lyric Search Queries ….

Solving Misheard Lyric Search Queries  using a Probabilistic Model of Speech Sounds

Hussein Hirjee and Daniel G. Brown

People often use lyrics to find songs – and they get them wrong. Some examples  ‘Nirvana’ = “Don’t walk on guns, burn your friends”.  Approach: look at using phonetic similarity.  They adapt ‘blast’ from DNA sequence matching to the problem.  Lyrics are represented as a sequence of matrix.

For training, they get data from ‘misheard lyrics’ sites like KissThisGuy.com.  They align the misheard with the real lyrics – to build a model of frequently  misheard phonemes.  They tested with KissThisGuy.com misheard lyrics.  Scored with 5 different models.

Evaluation: Mean Reciprocal Rank and Hit Rank by Rank.  The approach compared well with previous techniques.  Still, 17% of lyrics are still not identified – some are just bad queries, but dealing with short queries is a source of errors.  They also looked at phoneme confusion, in particular confusions caused by singing.

Future work: look at phoneme trigrams, and build a web site. Quesitioner suggests that they create a mondegreen generator

Good presentation, interesting, fun  problem area.

Leave a comment

APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS

APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS  – Matthias Mauch and Simon Dixon

This is a new chroma extraction method using a non-negative least squares (NNLS) algorithm for prior approximate note transcription. Twelve different chroma methods were tested for chord transcription accuracy on popular music, using an existing high- level probabilistic model. The NNLS chroma features achieved top results of 80% accuracy that significantly exceed the state of the art by a large margin.

We have shown that the positive influence of the approximate transcription is particularly strong on chords whose harmonic structure causes ambiguities, and whose identification is therefore difficult in approaches without prior approximate transcription. The identification of these difficult chord types was substantially increased by up to twelve percentage points in the methods using NNLS transcription.

Matthias is an enthusiastic presenter who did not hesitate to jump onto the piano to demonstrate ‘difficult chords’.  Very nice presentation.

Leave a comment

Locating Tune Changes and Providing a Semantic Labelling of Sets of Irish Traditional Tunes

Locating Tune Changes and Providing a Semantic Labelling of Sets of Irish Traditional Tunes by Cillian Kelly (pdf)

Abstract – An approach is presented which provides the tune change loca- tions within a set of Irish Traditional tunes. Also provided are semantic labels for each part of each tune within the set. A set in Irish Traditional music is a number of individual tunes played segue. Each of the tunes in the set are made up of structural segments called parts. Musical variation is a prominent characteristic of this genre. However, a certain set of notes known as ‘set accented tones’ are considered impervious to musical variation. Chroma information is extracted at ‘set accented tone’ locations within the music. The resulting chroma vectors are grouped to represent the parts of the music. The parts are then compared with one another to form a part similarity matrix. Unit kernels which represent the possible structures of an Irish Traditional tune are matched with the part similarity matrix to determine the tune change locations and semantic part labels.

This looks to be a very hard problem to solve.

Leave a comment

Identifying Repeated Patterns in Music …

I am at ISMIR this week, blogging sessions and papers that I find interesting.

Identifying Repeated Patterns in Music using Sparse Convolutive Non-Negative Matrix Factorization – Ron Weiss, Juan Bello  (pdf)

Problem: Looking at repetition in music – verse, chorus, repeated motifs.  Can one identify high level and short term structiure simulataneous from audio? Lots of math in this.

Ron describes an unsupervised, data-driven, method for automatically identifying repeated patterns in music by analyzing a feature matrix using a variant of sparse convolutive non-negative matrix factorization. They utilize sparsity constraints to automatically identify the number of patterns and their lengths, parameters that would normally need to be fixed in advance. The proposed analysis is applied to beat- synchronous chromagrams in order to concurrently extract repeated harmonic motifs and their locations within a song.  They show how this analysis can be used for long- term structure segmentation, resulting in an algorithm that is competitive with other state-of-the-art segmentation algorithms based on hidden Markov models and self similarity matrices.

One particular application is riff identification for music thumbnailing. Another application is structure segmentation – verse chorus, bridge etc.)

The code is open-sourced here:  http://ronw.github.com/siplca-segmentation/

This was a really interesting presentation, with great examples. Excellent work.  This one should be a candidate for best paper IMHO.

Leave a comment