Archive for August, 2010

ON THE APPLICABILITY OF PEER-TO-PEER DATA IN MUSIC INFORMATION RETRIEVAL RESEARCH

ON THE APPLICABILITY OF PEER-TO-PEER DATA IN MUSIC INFORMATION RETRIEVAL RESEARCH (pdf)
Noam Koenigstein, Yuval Shavitt, Ela Weinsberg, and Udi Weinsberg

abstract:Peer-to-Peer (p2p) networks are being increasingly adopted as an invaluable resource for various music information re- trieval (MIR) tasks, including music similarity, recommen- dation and trend prediction. However, these networks are usually extremely large and noisy, which raises doubts re- garding the ability to actually extract sufficiently accurate information.

This paper evaluates the applicability of using data orig- inating from p2p networks for MIR research, focusing on partial crawling, inherent noise and localization of songs and search queries. These aspects are quantified using songs collected from the Gnutella p2p network. We show that the power-law nature of the network makes it relatively easy to capture an accurate view of the main-streams using relatively little effort. However, some applications, like trend prediction, mandate collection of the data from the “long tail”, hence a much more exhaustive crawl is needed. Furthermore, we present techniques for overcoming noise originating from user generated content and for filtering non informative data, while minimizing information loss

Observation – CF systems tend to outperform content-based systems until you get in the long tail – so to improved CF systems, you need more long tail data.  This work explores how to get more long tail data by mining p2p networks.

P2P systems have some problems – privacy concerns, data collection is hard. High user churn, very noisy data, some users delete content from shared folders right away, sparsity

P2P mining Shared folders are useful for similarity, search queries are useful for trends.

Lots of p2p challenges and steps – getting IP addresses for p2p nodes, filtering out non-musical content, geo-identification, anonymization.

Dealing with sparsity:  1.2 million users, but average of 1 artist/song data point for each artist/song relation.  These graphs show song popularity in shared folders. They use this data to help filter out non-typical users.

Identifying songs: Use the hash file – but of course many songs have many different digital copies – so they also look at the (noisy) metadata.

Songs Discovery Rate

Once you reach about 1/3 of the network you’ve found most of the tracks if you use metadata for resolving.  If you use the hashes, you need to crawl 70% of the network.

Using shared folders for similarity

There’s a preferential attachment model for popular  songs

Conclusion: P2P data is good source of long tail data, but dealing with the noisy data is hard.  The p2p data is especially good for building similarity models localized to countries. A good talk with from someone with lots of experience with p2p stuff.

Leave a Comment

LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY

LOOKING THROUGH THE “GLASS CEILING”: A CONCEPTUAL FRAMEWORK FOR THE PROBLEMS OF SPECTRAL SIMILARITY

Alexandros Nanopoulos

Ioannis Karydis, Milosˇ Radovanovic, Mirjana Ivanovic

Abstract: Spectral similarity measures have been shown to exhibit good performance in several Music Information Retrieval (MIR) applications. They are also known, however, to pos- sess several undesirable properties, namely allowing the existence of hub songs (songs which frequently appear in nearest neighbor lists of other songs), “orphans” (songs which practically never appear), and difficulties in distin- guishing the farthest from the nearest neighbor due to the concentration effect caused by high dimensionality of data space. In this paper we develop a conceptual framework that allows connecting all three undesired properties. We show that hubs and “orphans” are expected to appear in high-dimensional data spaces, and relate the cause of their appearance with the concentration property of distance / similarity measures. We verify our conclusions on real mu- sic data, examining groups of frames generated by Gaus- sian Mixture Models (GMMs), considering two similar- ity measures: Earth Mover’s Distance (EMD) in combi- nation with Kullback-Leibler (KL) divergence, and Monte Carlo (MC) sampling. The proposed framework can be useful to MIR researchers to address problems of spectral similarity, understand their fundamental origins, and thus be able to develop more robust methods for their remedy.

Problem is mainly due to the high-dimensional vector space – so problems like hubs, orphans are expected. So, lets look at how to deal with this problem of high-dimensionality.

One problem, in Euclidean space, as we get into higher dimensions it harder to distinguish between the farthest and the nearest neighbor in high dimensions.

This is a natural result of high dimensionality and  leads to the problem of hubs and orphans.

Another way of looking at this is to show the ratio between the standard deviation andof the neighbor distances as a function of dimensionality:

Conclusion – high dimensionality is responsible for problems of hubs, orphans and the concentration effect.

This was an interesting talk and has lots of potential impact on spectral similarity.

Leave a Comment

MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW

MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEWYoungmoo E. Kim, Erik M. Schmidt, Raymond Migneco, Brandon G. Morton Patrick Richardson, Jeffrey Scott, Jacquelin A. Speck, and Douglas Turnbull (pdf)

Youngmoo presents on mood

From the paper: Recognizing musical mood remains a challenging problem primarily due to the inherent ambiguities of human emotions. Though research on this topic is not as mature as some other Music-IR tasks, it is clear that rapid progress is being made. In the past 5 years, the performance of automated systems for music emotion recognition using a wide range of annotated and content-based features (and multi-modal feature combinations) have advanced significantly. As with many Music-IR tasks open problems remain at all levels, from emotional representations and annotation methods to feature selection and machine learning.

While significant advances have been made, the most accurate systems thus far achieve predictions through large-scale machine learning algorithms operating on vast feature sets, sometimes spanning multiple domains, applied to relatively short musical selections. Oftentimes, this approach reveals little in terms of the underlying forces driving the perception of musical emotion (e.g., varying contributions of features) and, in particular, how emotions in music change over time. In the future, we anticipate further collaborations between Music-IR researchers, psychologists, and neuroscientists, which may lead to a greater understanding of not only mood within music, but human emotions in general. Furthermore, it is clear that individu- als perceive emotions within music differently. Given the multiple existing approaches for modeling the ambiguities of musical mood, a truly personalized system would likely need to incorporate some level of individual profiling to adjust its predictions.

This paper has provided a broad survey of the state of the art, highlighting many promising directions for further research. As attention to this problem increases, it is our hope that the progress of this research will continue to accelerate in the near future.

My notes:

Mirex performance on mood classification has held steady for the last few years.  Most mood classification systems in Mirex are just adapted genre classifiers.

Categorical vs. dimensional

Categorical: Mirex classifies mood into 5 clusters:

Dimensional:  - the ever popular Valence-Arousal space – sometimes called the Thayer Mood model:


Typical emotion classification system

Ground Truth

A big challenge is to come up with groundtruth for training a recognition system. Last.fm tags,  GWAP,  AMG labels, web documents are common sources.

Lyrics – using lyrics alone has not been too successful for  mood classification.

Content-based methods – typical features for mood:

Youngmoo’s latest work (with Eric Schmidt) is showing the distribution and change of emotion over time.

Hybrid systems

  • Audio + Lyrics – some to high improvement
  • Audio + Tags = good improvement
  • Audio + Images = using album art to derive associations to mood

Conclusions – Mood recognition hasn’t improved much in recent years – probably because most systems are not really designed specifically for mood.

This was a great overview of the state-of-the-art. I’d be interested in hearing a much longer version of this talk.  The paper and the references will be a great resource for anyone who’s interested in pursuing mood classification.

Leave a Comment

Solving Misheard Lyric Search Queries ….

Solving Misheard Lyric Search Queries  using a Probabilistic Model of Speech Sounds

Hussein Hirjee and Daniel G. Brown

People often use lyrics to find songs – and they get them wrong. Some examples  ’Nirvana’ = “Don’t walk on guns, burn your friends”.  Approach: look at using phonetic similarity.  They adapt ‘blast’ from DNA sequence matching to the problem.  Lyrics are represented as a sequence of matrix.

For training, they get data from ‘misheard lyrics’ sites like KissThisGuy.com.  They align the misheard with the real lyrics – to build a model of frequently  misheard phonemes.  They tested with KissThisGuy.com misheard lyrics.  Scored with 5 different models.

Evaluation: Mean Reciprocal Rank and Hit Rank by Rank.  The approach compared well with previous techniques.  Still, 17% of lyrics are still not identified – some are just bad queries, but dealing with short queries is a source of errors.  They also looked at phoneme confusion, in particular confusions caused by singing.

Future work: look at phoneme trigrams, and build a web site. Quesitioner suggests that they create a mondegreen generator

Good presentation, interesting, fun  problem area.

Leave a Comment

APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS

APPROXIMATE NOTE TRANSCRIPTION FOR THE IMPROVED IDENTIFICATION OF DIFFICULT CHORDS  - Matthias Mauch and Simon Dixon

This is a new chroma extraction method using a non-negative least squares (NNLS) algorithm for prior approximate note transcription. Twelve different chroma methods were tested for chord transcription accuracy on popular music, using an existing high- level probabilistic model. The NNLS chroma features achieved top results of 80% accuracy that significantly exceed the state of the art by a large margin.

We have shown that the positive influence of the approximate transcription is particularly strong on chords whose harmonic structure causes ambiguities, and whose identification is therefore difficult in approaches without prior approximate transcription. The identification of these difficult chord types was substantially increased by up to twelve percentage points in the methods using NNLS transcription.

Matthias is an enthusiastic presenter who did not hesitate to jump onto the piano to demonstrate ‘difficult chords’.  Very nice presentation.

Leave a Comment

Locating Tune Changes and Providing a Semantic Labelling of Sets of Irish Traditional Tunes

Locating Tune Changes and Providing a Semantic Labelling of Sets of Irish Traditional Tunes by Cillian Kelly (pdf)

Abstract – An approach is presented which provides the tune change loca- tions within a set of Irish Traditional tunes. Also provided are semantic labels for each part of each tune within the set. A set in Irish Traditional music is a number of individual tunes played segue. Each of the tunes in the set are made up of structural segments called parts. Musical variation is a prominent characteristic of this genre. However, a certain set of notes known as ‘set accented tones’ are considered impervious to musical variation. Chroma information is extracted at ‘set accented tone’ locations within the music. The resulting chroma vectors are grouped to represent the parts of the music. The parts are then compared with one another to form a part similarity matrix. Unit kernels which represent the possible structures of an Irish Traditional tune are matched with the part similarity matrix to determine the tune change locations and semantic part labels.

This looks to be a very hard problem to solve.

Leave a Comment

Identifying Repeated Patterns in Music …

I am at ISMIR this week, blogging sessions and papers that I find interesting.

Identifying Repeated Patterns in Music using Sparse Convolutive Non-Negative Matrix Factorization - Ron Weiss, Juan Bello  (pdf)

Problem: Looking at repetition in music – verse, chorus, repeated motifs.  Can one identify high level and short term structiure simulataneous from audio? Lots of math in this.

Ron describes an unsupervised, data-driven, method for automatically identifying repeated patterns in music by analyzing a feature matrix using a variant of sparse convolutive non-negative matrix factorization. They utilize sparsity constraints to automatically identify the number of patterns and their lengths, parameters that would normally need to be fixed in advance. The proposed analysis is applied to beat- synchronous chromagrams in order to concurrently extract repeated harmonic motifs and their locations within a song.  They show how this analysis can be used for long- term structure segmentation, resulting in an algorithm that is competitive with other state-of-the-art segmentation algorithms based on hidden Markov models and self similarity matrices.

One particular application is riff identification for music thumbnailing. Another application is structure segmentation – verse chorus, bridge etc.)

The code is open-sourced here:  http://ronw.github.com/siplca-segmentation/

This was a really interesting presentation, with great examples. Excellent work.  This one should be a candidate for best paper IMHO.

Leave a Comment

What’s Hot? Estimating Country Specific Artist Popularity

I am at ISMIR this week, blogging sessions and papers that I find interesting.

What’s Hot? Estimating Countrhy Specific Artist Popularity
Markus Schedl, Tim Pohle, Noam Koenigstein, Peter Knees

Traditional charts are not perfect, not available in on countries, have biases (sales vs. plays), don’t incorporate non-sales channels like p2p. inhomogenity between countries .

Approach: Look at different channels: Google, Twitter, shared folders in Gnutella, Last.fm

  • Google:  ”led zeppelin”  + “france” but applied a popularity filter to reduce affect of overall popularity
  • twiiter – geolocated major citiies of the world using freebase. Used twitter APIs with #nowplaying hashtag along with the geolocation api to search for plays in a particular country
  • P2p shared folders – gnutella network – gathered a million gnutella IP addresses, gathered the metadata for the shared folders at each address, used IP2location to resolve to a geographic location
  • Last.fm – retreive top 400 listeners in each country. For these top 400 listeners, retrieve the top-played artists.

Evaluation:   Retrieve Last.fm most popular. Use top-n rank overlap for scoring. Compared the 4 different sources.  Each approach was prone to certain distortions and bias.   For future they hope to combine these sources to build a hybrid system that combines best attributes of all approaches.

1 Comment

ISMIR Day zero in Utrecht

We’ve just finished Day 0 of ISMIR (the yearly conference of the International Society of Music Information Retrieval) being held in Utrecht.  It is a lovely city, I’ve been enjoying walks along the many canals in the comfortably cool weather.

The zeroth day of ISMIR is the tutorial day.  Ben Fields and I presented our playlisting tutorial.  It was well attended, with lots of good questions at the end.  The 3 hour long presentation seemed to fly by.   Here’s Ben making last minute edits just before the presentation.

Leave a Comment

Finding a path through the Jukebox: The Playlist Tutorial

Ben Fields and I have just put the finishing touches on our playlisting tutorial for ISMIR.  Everything you could want to know about playlists.  As one of the founders of a well known music intelligence company once said: Take the fun out of music and read Paul’s slides

1 Comment

Follow

Get every new post delivered to your Inbox.

Join 90 other followers