Posts Tagged ismir
The first session at ISMIR today is on the Web. 4 really interesting sets of papers:
Songle – an active music listening experience
Mastaka Goto presented Songle at ISMIR this morning. Songle is a web site for active music listening and content-based music browsing. Songle takes many of the MIR techniques that researchers have been working on for years and makes it available to non-MIR experts to help them understand music better. You can also use Songle to modify the music. You can interactively change the beat and melody, copy and paste sections. Your edits can be shared with others. Masataka hopes that Songle can serve as a showcase of MIR and music-understanding of technologies and will serve as a platform for other researchers as well. There’s a lot of really powerful music technology behind Songle. I look forward to trying it out. Paper.
Improving Perceptual Tempo estimation with Crowd-Source Annotations
Mark Levy from Last.fm describes the Last.fm experiment to crowd source the gathering of tempo information (fast, slow and BPM) that can be used to help eliminate tempo ambiguity in machine-estimated tempo (typically known as the octave error). They ran their test over 4K songs from a number of genres. So far they’ve had 27K listeners apply 200k labels and bpm estimates. (woah!). Last.fm is releasing this dataset. Very interesting work. Paper
Investigating the similarity space of music artists on the micro-blogosphere
Markus Schedl analyzed 6 million tweets by searching tweets for artist names and conducted a number of experiments to see if artist similarity could be determined based upon these tweets. (They used the Comirva framework to conduct the experiments). Findings: document based techniques work best (cosine similarity, while not always yielding the best result yielded the most stable results). Unsurprisingly adding the term ‘music’ to the twitter search helps a lot (Reducing the CAKE, Spoon and KISS problems). Surprising result is that using tweets for deriving similarity works better than using larger documents derived from web search. Markus suggest that this may be due to the higher information content in the much shorter tweets. Datasets are available. Paper
Music Influence Network Analysis and Rank of Sample-based Music
Nick Bryan from Stanford – trying to understand how songs/artists and genres interact with the sampled-base music (remixes etc). Using data from Whosampled.com – (42K user-generated sample info sets). From this data they created an directed graph and did some network analysis on the graph (centrality / influence) – Hypothesized that there’s a power law distribution of connectivity (typical small-worlds, scale-free distribution with a rich-gets-richer effect). They confirmed this hypothesis. Use Katz Influence to help understand sample-chains. From the song-sample graph, artist sample graphs (who sampled whom) and genre sample graphs (which genres sample from other genres) were derived. With all these graphs, Nick was then able to understand which songs and artists are the most influential (James Brown is king of sampling), surprisingly, the AMEN break is only the second most influential sample. Interesting and fun work. Paper
[tweetmeme only_single=false] iTunes Smart Playlists allow for very flexible creation of dynamic playlists based on a whole boat-load of parameters. But I wonder how often people use this feature. Is it too complicated? Let’s find out. I’ve created a poll that will take you about 20 seconds to complete. Go to iTunes, count up how many smart playlists you have. You can tell which playlists are smart playlists because they have the little gear icon:
Don’t count the pre-fab smart playlists that come with iTunes (like 90’s music, Recently Added, My Top Rated, etc.). Once you’ve counted up your playlists, take the poll:
[tweetmeme source= ‘plamere’ only_single=false] I’m conducting a somewhat informal survey on playlisting to compare how well playlists created by an expert radio DJ compare to those generated by a playlisting algorithm and a random number generator. So far, nearly 200 people have taken the survey (Thanks!). Already I’m seeing some very interesting results. Here’s a few tidbits (look for a more thorough analysis once the survey is complete).
People expect human DJs to make better playlists:
The survey asks people to try to identify the origin of a playlist (human expert, algorithm or random) and also rate each playlist. We can look at the ratings people give to playlists based on what they think the playlist origin is to get an idea of people’s attitudes toward human vs. algorithm creation.
Predicted Origin Rating ---------------- ------ Human expert 3.4 Algorithm 2.7 Random 2.1
We see that people expect humans to create better playlists than algorithms and that algorithms should give better playlists than random numbers. Not a surprising result.
Human DJs don’t necessarily make better playlists:
Now lets look at how people rated playlists based on the actual origin of the playlists:
Actual Origin Rating ------------- ------ Human expert 2.5 Algorithm 2.7 Random 2.6
These results are rather surprising. Algorithmic playlists are rated highest, while human-expert-created playlists are rated lowest, even lower than those created by the random number generator. There are lots of caveats here, I haven’t done any significance tests yet to see if the differences here really matter, the survey size is still rather small, and the survey doesn’t present real-world playlist listening conditions, etc. Nevertheless, the results are intriguing.
I’d like to collect more survey data to flesh out these results. So if you haven’t already, please take the survey:
[tweetmeme source= ‘plamere’ only_single=false] Playlists have long been a big part of the music experience. But making a good playlist is not always easy. We can spend lots of time crafting the perfect mix, but more often than not, in this iPod age, we are likely to toss on a pre-made playlist (such as an album), have the computer generate a playlist (with something like iTunes Genius) or (more likely) we’ll just hit the shuffle button and listen to songs at random. I pine for the old days when Radio DJs would play well-crafted sets – mixes of old favorites and the newest, undiscovered tracks – connected in interesting ways. These professionally created playlists magnified the listening experience. The whole was indeed greater than the sum of its parts.
The tradition of the old-style Radio DJ continues on Internet Radio sites like Radio Paradise. RP founder/DJ Bill Goldsmith says of Radio Paradise: “Our specialty is taking a diverse assortment of songs and making them flow together in a way that makes sense harmonically, rhythmically, and lyrically — an art that, to us, is the very essence of radio.” Anyone who has listened to Radio Paradise will come to appreciate the immense value that a professionally curated playlist brings to the listening experience.
I wish I could put Bill Goldsmith in my iPod and have him craft personalized playlists for me – playlists that make sense harmonically, rhythmically and lyrically, and customized to my music taste, mood and context . That, of course, will never happen. Instead I’m going to rely on computer algorithms to generate my playlists. But how good are computer generated playlists? Can a computer really generate playlists as good as Bill Goldsmith, with his decades of knowledge about good music and his understanding of how to fit songs together?
To help answer this question, I’ve created a Playlist Survey – that will collect information about the quality of playlists generated by a human expert, a computer algorithm and a random number generator. The survey presents a set of playlists and the subject rates each playlist in terms of its quality and also tries to guess whether the playlist was created by a human expert, a computer algorithm or was generated at random.
Bill Goldsmith and Radio Paradise have graciously contributed 18 months of historical playlist data from Radio Paradise to serve as the expert playlist data. That’s nearly 50,000 playlists and a quarter million song plays spread over nearly 7,000 different tracks.
The Playlist Survey also servers as a Radio DJ Turing test. Can a computer algorithm (or a random number generator for that matter) create playlists that people will think are created by a living and breathing music expert? What will it mean, for instance, if we learn that people really can’t tell the difference between expert playlists and shuffle play?
Ben Fields and I will offer the results of this Playlist when we present Finding a path through the Jukebox – The Playlist Tutorial – at ISMIR 2010 in Utrecth in August. I’ll also follow up with detailed posts about the results here in this blog after the conference. I invite all of my readers to spend 10 to 15 minutes to take The Playlist Survey. Your efforts will help researchers better understand what makes a good playlist.
ISMIR 2009 is over – but it will not be soon forgotten. It was a wonderful event, with seemingly flawless execution. Some of my favorite things about the conference this year:
- The proceedings – distributed on a USB stick hidden in a pen that has a laser! And the battery for the laser recharges when you plug the USB stick into your computer. How awesome is that!? (The printed version is very nice too, but it doesn’t have a laser).
- The hotel – very luxurious while at the same time, very affordable. I had a wonderful view of Kobe, two very comfortable beds and a toilet with more controls than the dashboard on my first car.
- The presentation room – very comfortable with tables for those sitting towards the front, great audio and video and plenty of power and wireless for all.
- The banquet – held in the most beautiful room in the world with very exciting Taiko drumming as entertainment.
- The details – it seems like the organizing team paid attention to every little detail and request – they had taped numbers on the floor so that the 30 folks giving their 30 second pitches during poster madness would know just where to stand, to the signs on the coffeepots telling you that the coffee was being made, to the signs on the train to the conference center welcoming us to ISMIR 2009. It seems like no detail was left to chance.
- The food – our stomachs were kept quite happy – with sweet breads and pastries every morning, bento boxes for lunch, and coffee, juices, waters, and the mysterious beverage ‘black’ that I didn’t dare to try. My absolute favorite meal was the box lunch during the tutorial day – it was a box with a string – when you are ready to eat you give the string a sharp tug – wait a few minutes for the magic to do its job and then you open the box and eat a piping hot bowl of noodles and vegetables. Almost as cool as the laser-augmented proceedings.
- The city – Kobe is a really interesting city – I spent a few days walking around and was fascinated by it all. I really felt like I was walking around in the future. It was extremely clean, the people will very polite, friendly and always willing to help. Going into some parts of town was sensory overload, the colors, sounds, smells, the sights were overwhelming – it was really fun.
- the Keynote – music making robots – what more is there to say.
- The Program – the quality of papers was very high – there was some outstanding posters and oral presentations. Much thanks to George and Keiji for organizing the reviews to create a great program. (More on my favorite posters and papers in an upcoming post)
- f(mir) – The student-organized workshop looked at what MIR research would look like in 10, 20 or even 50 years (basically after I’m dead and gone). The presentations in this workshop were quite provactive – well done students!
I write this post as I sit in the airport in Osaka waiting for my flight home. I’m tired, but very energized to explore the many new ideas that I encountered at the conference. It was a great week. I want to extend my personal thanks to Professor Fujinaga and Professor Goto and the rest of the conference committee for putting together a wonderful week.
Session Title: Sociology & Ethnomusicology
Sally Jo Cunningham and David M. Nichols
Abstract: This paper builds an understanding how music is currently listened to by small (fewer than 10 individuals) to medium-sized (10 to 40 individuals) gatherings of people—how songs are chosen for playing, how the music fits in with other activities of group members, who supplies the music, the hardware/software that supports song selection and presentation. This fine-grained context emerges from a qualitative analysis of a rich set of participant observations and interviews focusing on the selection of songs to play at social gatherings. We suggest features for software to support music playing at parties.
- What happens at parties, especially informal small and medium sized parties
- Observations and interviews – 43 party observations
- Analyzing the data: key events that drive the activity, patterns of behavior, social roles
- music selection cannot require fine motor movements (because of drinking and holding their drings) (Drinking dislexia)
- Need for large displays
- Party collection from different donors, sources, media
- Pre-party: host collection
- As party progresses: additional contributions (ipods, thumbdrives, etc)
- Challenge: bring together into a single browseable searchable collection
- Roles: Host, guest, guest of honor. Host provides initial collection, party playlist. High stress ‘guilty pleasures’
- Guests: may contribute, could insult the host, may modify party playlist if receive the invitation from the host. Voting jukeboxes may help
- Guest of Honor had ultimate control
- insertion into playlist, looking for specific song, type of song.
- Delete songs from playlist without disrupting the party
- Setting and maintaining atmosphere
- softer for starts, move to faster louder, ending with chilling out
- What next:other situations, long car ride
- Questions: Spotify turned into the best party
Great study, great presentation.
Emilia Gómez, Martín Haro and Perfecto Herrera
Abstract: This paper analyses how audio features related to different musical facets can be useful for the comparative analysis and classification of music from diverse parts of the world. The music collection under study gathers around 6,000 pieces, including traditional music from different geographical zones and countries, as well as a varied set of Western musical styles. We achieve promising results when trying to automatically distinguish music from Western and non-Western traditions. A 86.68% of accuracy is obtained using only 23 audio features, which are representative of distinct musical facets (timbre, tonality, rhythm), indicating their complementarity for music description. We also analyze the relative performance of the different facets and the capability of various descriptors to identify certain types of music. We finally present some results on the relationship between geographical location and musical features in terms of extracted descriptors. All the reported outcomes demonstrate that automatic description of audio signals together with data mining techniques provide means to characterize huge music collections from different traditions, complementing ethnomusicological manual analysis and providing a link between music and geography.
Abstract: The wide range of vocal styles, musical textures and re- cording techniques found in ethnomusicological field recordings leads us to consider the problem of automatic- ally labeling the content to know whether a recording is a song or instrumental work. Furthermore, if it is a song, we are interested in labeling aspects of the vocal texture: e.g. solo, choral, acapella or singing with instruments. We present evidence to suggest that automatic annotation is feasible for recorded collections exhibiting a wide range of recording techniques and representing musical cultures from around the world. Our experiments used the Alan Lomax Cantometrics training tapes data set, to encourage future comparative evaluations. Experiments were con- ducted with a labeled subset consisting of several hun- dred tracks, annotated at the track and frame levels, as acapella singing, singing plus instruments or instruments only. We trained frame-by-frame SVM classifiers using MFCC features on positive and negative exemplars for two tasks: per-frame labeling of singing and acapella singing. In a further experiment, the frame-by-frame classifier outputs were integrated to estimate the predominant content of whole tracks. Our results show that frame-by- frame classifiers achieved 71% frame accuracy and whole track classifier integration achieved 88% accuracy. We conclude with an analysis of classifier errors suggesting avenues for developing more robust features and classifier strategies for large ethnographically diverse collections.
Ruben Hillewaere, Bernard Manderick and Darrell Conklin
Abstract: Music classification has been widely investigated in the past few years using a variety of machine learning approaches. In this study, a corpus of 3367 folk songs, divided into six geographic regions, has been created and is used to evaluate two popular yet contrasting methods for symbolic melody classification. For the task of folk song classification, a global feature approach, which summarizes a melody as a feature vector, is outperformed by an event model of abstract event features. The best accuracy obtained on the folk song corpus was achieved with an ensemble of event models. These results indicate that the event model should be the default model of choice for folk song classification.
Meinard Mueller, Peter Grosche and Frans Wiering
Abstract: Even though folk songs have been passed down mainly by oral tradition, most musicologists study the relation between folk songs on the basis of score-based transcriptions. Due to the complexity of audio recordings, once having the transcriptions, the original recorded tunes are often no longer studied in the actual folk song research though they still may contain valuable information. In this paper, we introduce an automated approach for segment- ing folk song recordings into its constituent stanzas, which can then be made accessible to folk song researchers by means of suitable visualization, searching, and navigation interfaces. Performed by elderly non-professional singers, the main challenge with the recordings is that most singers have serious problems with the intonation, fluctuating with their voices even over several semitones throughout a song. Using a combination of robust audio features along with various cleaning and audio matching strategies, our approach yields accurate segmentations even in the presence of strong deviations.
Notes: Interesting talk (as always) by Meinard about dealing with real world problems when dealing with folk song audio recordings.
Korinna Bade, Andreas Nurnberger, Sebastian Stober, Jörg Garbers and Frans Wiering
Abstract: In folk song research, appropriate similarity measures can be of great help, e.g. for classification of new tunes. Several measures have been developed so far. However, a particular musicological way of classifying songs is usually not directly reflected by just a single one of these measures. We show how a weighted linear combination of different basic similarity measures can be automatically adapted to a specific retrieval task by learning this metric based on a special type of constraints. Further, we describe how these constraints are derived from information provided by experts. In experiments on a folk song database, we show that the proposed approach outperforms the underlying basic similarity measures and study the effect of different levels of adaptation on the performance of the retrieval system.