Audio Fingerprinting

James Parker

Audio fingerprinting is a form of music information retrieval (MIR) that prioritizes ‘searchable representation’ over ‘representational completeness’ (Downie 309-310). An audio fingerprint is a ‘unique and compact’ hash or signature ‘derived from perceptually relevant aspects of a recording’ (Cano et al 2005, 233). This hash can be produced in many ways, ranging from ‘constellation mapping’ of spectrogram peaks (Wang 2003; 2006) to methods involving hidden Markov models and machine learning (Cano et al 2005). But the basic idea is that, once a corpus of recordings has been gathered and fingerprinted, an unknown audio sample can be compared against it for the purposes of identification (Ellis).

So far, there have been two main commercial applications of audio fingerprinting. First, in consumer music identification applications like Shazam, which were first used to sell ringtones, mp3s and concert tickets, but are now basically in the data business. Second, for the purposes of digital rights management and royalty collection. The most famous such application is for Content ID, the byzantine and utterly opaque automated content identification system used by Google ‘to easily identify and manage copyright-protected content on YouTube.’

For more resources and experiments related to audio fingerprinting see:

Resources cited above

J. Stephen Downie (2003) Music information retrieval. Annual Review of Information Science and Technology 37: 295-340

Cano et al (2005) ‘Audio Fingerprinting: Concepts And Applications’ in Computational Intelligence for Modelling and Prediction, Springer

Cano et al (2005) ‘A Review of Audio Fingerprinting’, Journal of VLSI Signal Processing 41, 271–284, DOI: 10.1007/s11265-005-4151-3

Avery Wang (2006) ‘The Shazam music recognition service’, Communications of the ACM 49(8), 44–48

Ellis et al (2011) ‘Echoprint: An Open Music Identification Service’, Proceedings of the International Society for Music Information Retrieval