Archive for category research

LastFM-ArtistTags2007

Posted by Paul in code, data, research, tags on November 10, 2010

A few years back I created a data set of social tags from Last.fm. RJ at Last.fm graciously gave permission for me to distribute the dataset for research use. I hosted the dataset on the media server at Sun Labs. However, with the Oracle acquisition, the media server is no longer serving up the data, so I thought I would post the data elsewhere.

The dataset is now available for download here: Lastfm-ArtistTags2007

Here are the details as told in the README file:

The LastFM-ArtistTags2007 Data set
Version 1.0
June 2008

What is this?

    This is a set of artist tag data collected from Last.fm using
    the Audioscrobbler webservice during the spring of 2007.

    The data consists of the raw tag counts for the 100 most
    frequently occuring tags that Last.fm listeners have applied
    to over 20,000 artists.

    An undocumented (and deprecated) option of the audioscrobbler
    web service was used to bypass the Last.fm normalization of tag
    counts.  This data set provides raw tag counts.

Data Format:

  The data is formatted one entry per line as follows:

  musicbrainz-artist-id<sep>artist-name<sep>tag-name<sep>raw-tag-count

Example:

    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>american<sep>14
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>animals<sep>5
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art punk<sep>21
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art rock<sep>18
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>atmospheric<sep>4
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>avantgarde<sep>3

Data Statistics:

    Total Lines:      952810
    Unique Artists:    20907
    Unique Tags:      100784
    Total Tags:      7178442

Filtering:

    Some minor filtering has been applied to the tag data.  Last.fm will
    report tag with counts of zero or less on occasion. These tags have
    been removed.

    Artists with no tags have not been included in this data set.
    Of the nearly quarter million artists that were inspected, 20,907
    artists had 1 or more tags.

Files:

    ArtistTags.dat  - the tag data
    README.txt      - this file
    artists.txt     - artists ordered by tag count
    tags.txt        - tags ordered by tag count

License:

    The data in LastFM-ArtistTags2007 is distributed with permission of
    Last.fm.  The data is made available for non-commercial use only under
    the Creative Commons Attribution-NonCommercial-ShareAlike UK License.
    Those interested in using the data or web services in a commercial
    context should contact partners at last dot fm. For more information
    see http://www.audioscrobbler.net/data/

Acknowledgements:

    Thanks to Last.fm for providing the access to this tag data via their
    web services

Contact:

    This data was collected, filtered and by Paul Lamere of The Echo Nest. Send
    questions or comments to Paul.Lamere@gmail.com

last.fm, tags

1 Comment

Is that a million songs in your pocket, or are you just glad to see me?

Posted by Paul in Music, playlist, research, The Echo Nest, web services on September 2, 2010

Yesterday, Steve Jobs reminded us that it was less than 10 years ago when Apple announced the first iPod which could put a thousand songs in your pocket. With the emergence of cloud-based music services like Spotify and Rhapsody, we can now have a virtually endless supply of music in our pocket. The ‘bottomless iPod’ will have as big an effect on how we listen to music as the original iPod had back in 2001. But with millions of songs to chose from, we will need help finding music that we want to hear. Shuffle play won’t work when we have a million songs to chose from. We will need new tools that help us manage our listening experience. I’m convinced that one of these tools will be intelligent automatic playlisting.

This weekend at the Music Hack Day London, The Echo Nest is releasing the first version of our new Playlisting API. The Playlisting API lets developers construct playlists based on a flexible set of artist/song selection and sorting rules. The Echo Nest has deep data about millions of artists and songs. We know how popular Lady Gaga is, we know the tempo of every one of her songs, we know other artists that sound similar to her, we know where she’s from, we know what words people use to describe her music (‘dance pop’, ‘club’, ‘party music’, ‘female’, ‘diva’ ). With the Playlisting API we can use this data to select music and arrange it in all sorts of flexible ways – from very simple Pandora radio style playlists of similar sounding songs to elaborate playlists drawing on a wide range of parameters. Here are some examples of the types of playlists you can construct with the API:

Similar artist radio – generate a playlist of songs by similar artists
Jogging playlist – generate a playlist of 80s power pop with a tempo between 120 and 130 BPM, but never ever play Bon Jovi
London Music Hack Day Playlist -generate a playlist of electronic and techno music by unknown artists near London, order the tracks by tempo from slow to fast
Tomorrow’s top 40 – play the hottest songs by pop artists with low familiarity that are starting to get hottt
Heavy Metal Radio – A DMCA-Compliant radio stream of nothing but heavy metal

We have also provide a dynamic playlisting API that will allow for the creation of playlists that adapt based upon skipping and rating behavior of the listener.

I’m about to jump on a plane for the Music Hackday London where we will be demonstrating this new API and some cool apps that have already been built upon it. I’m hoping to see a few apps emerge from this Music Hack Day that use the new API. More info about the APIs and how you can use it to do all sorts of fun things will be forthcoming. For the motivated dive into the APIs right now.

3 Comments

Upbeat and Quirky, With a Bit of a Build: Interpretive Repertoires in Creative Music Search

Posted by Paul in events, Music, music information retrieval, research on August 13, 2010

Upbeat and Quirky, With a Bit of a Build: Interpretive Repertoires in Creative Music Search
Charlie Inskip, Andy MacFarlane and Pauline Rafferty

ABSTRACT Pre-existing commercial music is widely used to accompany moving images in films, TV commercials and computer games. This process is known as music synchronisation. Professionals are employed by rights holders and film makers to perform creative music searches on large catalogues to find appropriate pieces of music for syn- chronisation. This paper discusses a Discourse Analysis of thirty interview texts related to the process. Coded examples are presented and discussed. Four interpretive re- pertoires are identified: the Musical Repertoire, the Soundtrack Repertoire, the Business Repertoire and the Cultural Repertoire. These ways of talking about music are adopted by all of the community regardless of their interest as Music Owner or Music User.

Music is shown to have multi-variate and sometimes conflicting meanings within this community which are dynamic and negotiated. This is related to a theoretical feedback model of communication and meaning making which proposes that Owners and Users employ their own and shared ways of talking and thinking about music and its context to determine musical meaning. The value to the music information retrieval community is to inform system design from a user information needs perspective.

Leave a comment

What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas

Posted by Paul in events, ismir, music information retrieval, research on August 13, 2010

What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas
Peter Grosche, Meinard Müller and Craig Stuart Sapp

ABSTRACT – The automated extraction of tempo and beat information from music recordings is a challenging task. Especially in the case of expressive performances, current beat tracking approaches still have significant problems to accurately capture local tempo deviations and beat positions. In this paper, we introduce a novel evaluation framework for detecting critical passages in a piece of music that are prone to tracking errors. Our idea is to look for consistencies in the beat tracking results over multiple performances of the same underlying piece. As another contribution, we further classify the critical passages by specifying musical properties of certain beats that frequently evoke trac ing errors. Finally, considering three conceptually different beat tracking procedures, we conduct a case study on the basis of a challenging test set that consists of a variety of piano performances of Chopin Mazurkas. Our experimental results not only make the limitations of state-of-the-art beat trackers explicit but also deepens the understanding of the underlying music material.

Leave a comment

An Audio Processing Library for MIR Application Development in Flash

Posted by Paul in events, ismir, music information retrieval, research on August 13, 2010

An Audio Processing Library for MIR Application Development in Flash
Jeffrey Scott, Raymond Migneco, Brandon Morton, Christian M. Hahn, Paul Diefenbach and Youngmoo E. Kim

The Audio processing Library for Flash affords music-IR researchers the opportunity to generate rich, interactive, real-time music-IR driven applications. The various lev-els of complexity and control as well as the capability to execute analysis and synthesis simultaneously provide a means to generate unique programs that integrate content based retrieval of audio features. We have demonstrated the versatility and usefulness of ALF through the variety of applications described in this paper. As interest in mu sic driven applications intensifies, it is our goal to enable the community of developers and researchers in music-IR and related fields to generate interactive web-based media.

1 Comment

Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data

Posted by Paul in events, ismir, Music, music information retrieval, research on August 13, 2010

Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data
Michael Scott Cuthbert and Christopher Ariza

ABSTRACT – Music21 is an object-oriented toolkit for analyzing, searching, and transforming music in symbolic (score- based) forms. The modular approach of the project allows musicians and researchers to write simple scripts rapidly and reuse them in other projects. The toolkit aims to pro- vide powerful software tools integrated with sophisticated musical knowledge to both musicians with little pro- gramming experience (especially musicologists) and to programmers with only modest music theory skills.

Music21 looks to be a pretty neat toolkit for analyzing and manipulating symbolic music. It’s like Echo Nest Remix for MIDI. The blog has lots more info: music21 blog. You can get the toolkit here: music21

Leave a comment

State of the Art Report: Audio-Based Music Structure Analysis

Posted by Paul in events, ismir, music information retrieval, research on August 13, 2010

State of the Art Report: Audio-Based Music Structure Analysis
Jouni Paulus, Meinard Müller and Anssi Klapuri

ABSTRACT – Humans tend to organize perceived information into hierarchies and structures, a principle that also applies to music. Even musically untrained listeners unconsciously analyze and segment music with regard to various musical aspects, for example, identifying recurrent themes or detecting temporal boundaries between contrasting musical parts. This paper gives an overview of state-of-the- art methods for computational music structure analysis, where the general goal is to divide an audio recording into temporal segments corresponding to musical parts and to group these segments into musically meaningful categories. There are many different criteria for segmenting and structuring music audio. In particular, one can identify three conceptually different approaches, which we refer to as repetition-based, novelty-based, and homogeneity- based approaches. Furthermore, one has to account for different musical dimensions such as melody, harmony, rhythm, and timbre. In our state-of-the-art report, we address these different issues in the context of music structure analysis, while discussing and categorizing the most relevant and recent articles in this field.

This presentation is an overview of the music structure analysis problem, and the methods proposed for solving it. The methods have been divided into three categories: novelty-based approaches, homogeneity-based approaches, and repetition-based approaches. The comparison of different methods has been problematic because of the differring goals, but current evaluations suggest that none of the approaches is clearly superior at this time, and that there is still room for considerable improvements.

Leave a comment

The ISMIR business meeting

Posted by Paul in events, ismir, music information retrieval, research on August 12, 2010

Notes from the ISMIR business meeting – this is a meeting with the board of ISMIR.

Officers

President: J. Stephen Downie, University of Illinois at Urbana-Champaign, USA
Treasurer: George Tzanetakis, University of Victoria, Canada
Secretary: Jin Ha Lee, University of Illinois at Urbana-Champaign, USA
President-elect: Tim Crawford, Goldsmiths College, University of London, UK
Member-at-large: Doug Eck, University of Montreal, Canada
Member-at-large: Masataka Goto, National Institute of Advanced Industrial Science and Technology, Japan
Member-at-large: Meinard Mueller, Max-Planck-Institut für Informatik, Germany

Stephen reviewed the roles of the various officers and duties of the various committees. He reminded us that one does not need to be on the board to serve on a subcommittee.

Publication Issues

website redesign
Other communities hardly know about ISMIR. Want to help other communities be aware of our research. One way is to make more links to other communities. Entering committees in other communities.

Hosting Issue – will formalize documentation, location planning, site selection.

Name change? There was a nifty debate around the meaning of ISMIR. There was a proposal to change it to ‘International Society for Music Informatics Research’. I recommend, given Doug’s comments about Youtube from this morning that we change the name to: ‘ International Society for Movie Informatics Research’

Review Process: Good discussion about the review process – we want paper bidding and double-blind reviews. Helps avoid gender bias:

Doug snuck in the secret word ‘youtube’ too, just for those hanging out on IRC.

2 Comments

f(MIR) industrial panel

Posted by Paul in ismir, Music, music information retrieval, research on August 12, 2010

Douglas Eck (Google)
Greg Mead (Musicmetric)
Martin Roth (RjDj)
Ricardo Tarrasch (Meemix)
moderator: Rebecca Fiebrink (Princeton)

rjdj – music making apps on devices like iphones
musicmetric tracks 3 areas: Social networks, network analysis (influential fans), text via focused crawlers, p2p networks
memix – music recommendation, artist radio, artist similarity, playlists. Pandora-like human analysis on 150K songs – then they learn these tags with machine learning. Look at which features best predict the tags. Important question is ‘what is important for the listeners’. Their aim is to find best parameters for taste prediction.
google – goal is organize the world’s information. Doug would like to see an open API for companies to collaborate

Rebecca is the moderator.

What do you think is the next big thing? How is tech going to change things in the near future?

Doug (Google) thinks that ‘music recommendation is solved’ – he’s excited about the cellphone. Also excited about programs like chuck to make it easier for people to create music (nice pandering to the moderator, doug!)
Ricardo (MeeMix) – the laid back position is the future – reach the specific taste of a user. Personalized advertisements.
Greg (MusicMetric) – Cloudbased services will help us understand what people want which will yield to playlisting, recommendation, novel players.
Martin (RjDJ) – Thinks that the phone is really exciting – having all this power in the phone lets you do neat thing. He’s excited about how people will be able to create music – using sensory inputs, ambient audio.

How will tech revolutionize music?

Doug – being able to collaborate with Arcade Fire on online
Martin – musically illiterate should be able to make music
Ricardo – we can help new artists reach the right fans
Greg – services for helping artists, merchandising, ticket sales etc.

What are the most interesting problems or technical questions?

Greg – interested in understanding the behavior of the fans. Especially by those on P2P networks. Huge amount of geographic-specific listener data
Ricardo – more research around taste and recommendation
Doug – a rant – he had a paper rejected because the paper had something to do with music generation.
Rebecca – has a MIR for music google group :MIR4Music
Martin – engineering:increase performance in portable devices – research:how to extract music features from music cheaply
Ricardo – drumming style is hard to extract – but actually not that important for taste prediction

How would you characterize the relationship between biz and academia

Greg – there is lots of ‘advanced research’ in academia, while in industry there look at much more applied problems
Doug – suggests that the leader of an academic lab is key to bridging the gap between biz and academia. Grad students should be active in looking for the internships in industry to get a better understanding of what is needed in industry. It is all about getting grad students jobs in industry.

Audience Q/A

what tools can we create to help producers of music? – Answer: Youtube. Martin talks about understanding how people use music creation tools. Doug: “Don’t build things that people don’t want.” – to do this you need to try this on real data.

Hmmm … only one audience q/a. sigh …

Good panel, lots of interesting ideas. Here is the future of music:

2 Comments

MIR at Google: Strategies for Scaling to Large Music Datasets Using Ranking and Auditory Sparse-Code Representations

Posted by Paul in events, ismir, music information retrieval, research on August 12, 2010

MIR at Google: Strategies for Scaling to Large Music Datasets Using Ranking and Auditory Sparse-Code Representations
Douglas Eck (Google) (Invited speaker) – There’s no paper associated with this talk.

Machine Listening / Audio analysis – Dick Lyon and Samy Bengio

Main strength:

Scalable algorithms
- When they do work, they use large sets (like all audio on Youtube, or all audio on the web)
Sparse High dimensional Representations
- 15 numbers to describe a track
Auditory / Cohchlear Modeling
Autotagging at Youtube –
Retrieval, annotation, ranking, recommendation

Collaboration Opportunities

Faculty research awards
Google visiting faculty program
Student internships
Google summer of code
Research Infrastructure

The Future of MIR is already here

Next generation of listeners are using Youtube – because of the on-demand nature
Youtube – 2 billion views a day
Content ID scans over 100 years of video every day

The Bar is already set very high ..

Current online recommendation is pretty good
Doug wants to close the loop between music making and music listening

What would you like Google to give back to MIR?

1 Comment

Music Machinery

Archive for category research

LastFM-ArtistTags2007

Is that a million songs in your pocket, or are you just glad to see me?

Upbeat and Quirky, With a Bit of a Build: Interpretive Repertoires in Creative Music Search

What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas

An Audio Processing Library for MIR Application Development in Flash

Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data

State of the Art Report: Audio-Based Music Structure Analysis

The ISMIR business meeting

Officers

f(MIR) industrial panel

MIR at Google: Strategies for Scaling to Large Music Datasets Using Ranking and Auditory Sparse-Code Representations

Music Machinery

Top Posts

Related Stuff

Categories