LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY
LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY
Alexandros Nanopoulos
Ioannis Karydis, Milosˇ Radovanovic, Mirjana Ivanovic
Abstract: Spectral similarity measures have been shown to exhibit good performance in several Music Information Retrieval (MIR) applications. They are also known, however, to pos- sess several undesirable properties, namely allowing the existence of hub songs (songs which frequently appear in nearest neighbor lists of other songs), “orphans” (songs which practically never appear), and difficulties in distin- guishing the farthest from the nearest neighbor due to the concentration effect caused by high dimensionality of data space. In this paper we develop a conceptual framework that allows connecting all three undesired properties. We show that hubs and “orphans” are expected to appear in high-dimensional data spaces, and relate the cause of their appearance with the concentration property of distance / similarity measures. We verify our conclusions on real mu- sic data, examining groups of frames generated by Gaus- sian Mixture Models (GMMs), considering two similar- ity measures: Earth Mover’s Distance (EMD) in combi- nation with Kullback-Leibler (KL) divergence, and Monte Carlo (MC) sampling. The proposed framework can be useful to MIR researchers to address problems of spectral similarity, understand their fundamental origins, and thus be able to develop more robust methods for their remedy.
Problem is mainly due to the high-dimensional vector space – so problems like hubs, orphans are expected. So, lets look at how to deal with this problem of high-dimensionality.
One problem, in Euclidean space, as we get into higher dimensions it harder to distinguish between the farthest and the nearest neighbor in high dimensions.
This is a natural result of high dimensionality and leads to the problem of hubs and orphans.
Another way of looking at this is to show the ratio between the standard deviation andof the neighbor distances as a function of dimensionality:
Conclusion – high dimensionality is responsible for problems of hubs, orphans and the concentration effect.
This was an interesting talk and has lots of potential impact on spectral similarity.
MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW
Posted by Paul in ismir, music information retrieval, research on August 11, 2010
MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEWYoungmoo E. Kim, Erik M. Schmidt, Raymond Migneco, Brandon G. Morton Patrick Richardson, Jeffrey Scott, Jacquelin A. Speck, and Douglas Turnbull (pdf)
From the paper: Recognizing musical mood remains a challenging problem primarily due to the inherent ambiguities of human emotions. Though research on this topic is not as mature as some other Music-IR tasks, it is clear that rapid progress is being made. In the past 5 years, the performance of automated systems for music emotion recognition using a wide range of annotated and content-based features (and multi-modal feature combinations) have advanced significantly. As with many Music-IR tasks open problems remain at all levels, from emotional representations and annotation methods to feature selection and machine learning.
While significant advances have been made, the most accurate systems thus far achieve predictions through large-scale machine learning algorithms operating on vast feature sets, sometimes spanning multiple domains, applied to relatively short musical selections. Oftentimes, this approach reveals little in terms of the underlying forces driving the perception of musical emotion (e.g., varying contributions of features) and, in particular, how emotions in music change over time. In the future, we anticipate further collaborations between Music-IR researchers, psychologists, and neuroscientists, which may lead to a greater understanding of not only mood within music, but human emotions in general. Furthermore, it is clear that individu- als perceive emotions within music differently. Given the multiple existing approaches for modeling the ambiguities of musical mood, a truly personalized system would likely need to incorporate some level of individual profiling to adjust its predictions.
This paper has provided a broad survey of the state of the art, highlighting many promising directions for further research. As attention to this problem increases, it is our hope that the progress of this research will continue to accelerate in the near future.
My notes:
Mirex performance on mood classification has held steady for the last few years. Most mood classification systems in Mirex are just adapted genre classifiers.
Categorical vs. dimensional
Categorical: Mirex classifies mood into 5 clusters:
Dimensional: – the ever popular Valence-Arousal space – sometimes called the Thayer Mood model:
Typical emotion classification system
Ground Truth
A big challenge is to come up with groundtruth for training a recognition system. Last.fm tags, GWAP, AMG labels, web documents are common sources.
Lyrics – using lyrics alone has not been too successful for mood classification.
Content-based methods – typical features for mood:
Youngmoo’s latest work (with Eric Schmidt) is showing the distribution and change of emotion over time.
Hybrid systems
- Audio + Lyrics – some to high improvement
- Audio + Tags = good improvement
- Audio + Images = using album art to derive associations to mood
Conclusions – Mood recognition hasn’t improved much in recent years – probably because most systems are not really designed specifically for mood.
This was a great overview of the state-of-the-art. I’d be interested in hearing a much longer version of this talk. The paper and the references will be a great resource for anyone who’s interested in pursuing mood classification.
Solving Misheard Lyric Search Queries ….
Posted by Paul in ismir, music information retrieval, research on August 10, 2010
Solving Misheard Lyric Search Queries using a Probabilistic Model of Speech Sounds
Hussein Hirjee and Daniel G. Brown
People often use lyrics to find songs – and they get them wrong. Some examples ‘Nirvana’ = “Don’t walk on guns, burn your friends”. Approach: look at using phonetic similarity. They adapt ‘blast’ from DNA sequence matching to the problem. Lyrics are represented as a sequence of matrix.
For training, they get data from ‘misheard lyrics’ sites like KissThisGuy.com. They align the misheard with the real lyrics – to build a model of frequently misheard phonemes. They tested with KissThisGuy.com misheard lyrics. Scored with 5 different models.
Evaluation: Mean Reciprocal Rank and Hit Rank by Rank. The approach compared well with previous techniques. Still, 17% of lyrics are still not identified – some are just bad queries, but dealing with short queries is a source of errors. They also looked at phoneme confusion, in particular confusions caused by singing.
Future work: look at phoneme trigrams, and build a web site. Quesitioner suggests that they create a mondegreen generator
Good presentation, interesting, fun problem area.
APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS
Posted by Paul in events, ismir, music information retrieval, research on August 10, 2010
APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS – Matthias Mauch and Simon Dixon
This is a new chroma extraction method using a non-negative least squares (NNLS) algorithm for prior approximate note transcription. Twelve different chroma methods were tested for chord transcription accuracy on popular music, using an existing high- level probabilistic model. The NNLS chroma features achieved top results of 80% accuracy that significantly exceed the state of the art by a large margin.
We have shown that the positive influence of the approximate transcription is particularly strong on chords whose harmonic structure causes ambiguities, and whose identification is therefore difficult in approaches without prior approximate transcription. The identification of these difficult chord types was substantially increased by up to twelve percentage points in the methods using NNLS transcription.
Matthias is an enthusiastic presenter who did not hesitate to jump onto the piano to demonstrate ‘difficult chords’. Very nice presentation.
Locating Tune Changes and Providing a Semantic Labelling of Sets of Irish Traditional Tunes
Posted by Paul in events, ismir, music information retrieval on August 10, 2010
Locating Tune Changes and Providing a Semantic Labelling of Sets of Irish Traditional Tunes by Cillian Kelly (pdf)
Abstract – An approach is presented which provides the tune change loca- tions within a set of Irish Traditional tunes. Also provided are semantic labels for each part of each tune within the set. A set in Irish Traditional music is a number of individual tunes played segue. Each of the tunes in the set are made up of structural segments called parts. Musical variation is a prominent characteristic of this genre. However, a certain set of notes known as ‘set accented tones’ are considered impervious to musical variation. Chroma information is extracted at ‘set accented tone’ locations within the music. The resulting chroma vectors are grouped to represent the parts of the music. The parts are then compared with one another to form a part similarity matrix. Unit kernels which represent the possible structures of an Irish Traditional tune are matched with the part similarity matrix to determine the tune change locations and semantic part labels.
Identifying Repeated Patterns in Music …
Posted by Paul in events, ismir, music information retrieval on August 10, 2010
I am at ISMIR this week, blogging sessions and papers that I find interesting.
Identifying Repeated Patterns in Music using Sparse Convolutive Non-Negative Matrix Factorization – Ron Weiss, Juan Bello (pdf)
Problem: Looking at repetition in music – verse, chorus, repeated motifs. Can one identify high level and short term structiure simulataneous from audio? Lots of math in this.
Ron describes an unsupervised, data-driven, method for automatically identifying repeated patterns in music by analyzing a feature matrix using a variant of sparse convolutive non-negative matrix factorization. They utilize sparsity constraints to automatically identify the number of patterns and their lengths, parameters that would normally need to be fixed in advance. The proposed analysis is applied to beat- synchronous chromagrams in order to concurrently extract repeated harmonic motifs and their locations within a song. They show how this analysis can be used for long- term structure segmentation, resulting in an algorithm that is competitive with other state-of-the-art segmentation algorithms based on hidden Markov models and self similarity matrices.
One particular application is riff identification for music thumbnailing. Another application is structure segmentation – verse chorus, bridge etc.)
The code is open-sourced here: http://ronw.github.com/siplca-segmentation/
This was a really interesting presentation, with great examples. Excellent work. This one should be a candidate for best paper IMHO.
What’s Hot? Estimating Country Specific Artist Popularity
I am at ISMIR this week, blogging sessions and papers that I find interesting.
What’s Hot? Estimating Countrhy Specific Artist Popularity
Markus Schedl, Tim Pohle, Noam Koenigstein, Peter Knees
Traditional charts are not perfect, not available in on countries, have biases (sales vs. plays), don’t incorporate non-sales channels like p2p. inhomogenity between countries .
Approach: Look at different channels: Google, Twitter, shared folders in Gnutella, Last.fm
- Google: “led zeppelin” + “france” but applied a popularity filter to reduce affect of overall popularity
- twiiter – geolocated major citiies of the world using freebase. Used twitter APIs with #nowplaying hashtag along with the geolocation api to search for plays in a particular country
- P2p shared folders – gnutella network – gathered a million gnutella IP addresses, gathered the metadata for the shared folders at each address, used IP2location to resolve to a geographic location
- Last.fm – retreive top 400 listeners in each country. For these top 400 listeners, retrieve the top-played artists.
Evaluation: Retrieve Last.fm most popular. Use top-n rank overlap for scoring. Compared the 4 different sources. Each approach was prone to certain distortions and bias. For future they hope to combine these sources to build a hybrid system that combines best attributes of all approaches.
ISMIR Day zero in Utrecht
Posted by Paul in events, music information retrieval on August 10, 2010
We’ve just finished Day 0 of ISMIR (the yearly conference of the International Society of Music Information Retrieval) being held in Utrecht. It is a lovely city, I’ve been enjoying walks along the many canals in the comfortably cool weather.
The zeroth day of ISMIR is the tutorial day. Ben Fields and I presented our playlisting tutorial. It was well attended, with lots of good questions at the end. The 3 hour long presentation seemed to fly by. Here’s Ben making last minute edits just before the presentation.
Finding a path through the Jukebox: The Playlist Tutorial
Posted by Paul in events, music information retrieval, playlist, research, The Echo Nest on August 6, 2010
Ben Fields and I have just put the finishing touches on our playlisting tutorial for ISMIR. Everything you could want to know about playlists. As one of the founders of a well known music intelligence company once said: Take the fun out of music and read Paul’s slides …
Do you use Smart Playlists?
[tweetmeme only_single=false] iTunes Smart Playlists allow for very flexible creation of dynamic playlists based on a whole boat-load of parameters. But I wonder how often people use this feature. Is it too complicated? Let’s find out. I’ve created a poll that will take you about 20 seconds to complete. Go to iTunes, count up how many smart playlists you have. You can tell which playlists are smart playlists because they have the little gear icon:
Don’t count the pre-fab smart playlists that come with iTunes (like 90’s music, Recently Added, My Top Rated, etc.). Once you’ve counted up your playlists, take the poll:














