Audio fingerprinting’s most well known application is probably by Shazam, a company started in 2000 to enable listeners to identify a piece of music directly from their environment – at a gig or restaurant, from the TV or radio – despite high levels of background noise, compression, and distortion. After sampling fifteen to thirty seconds of audio, if a match was found in the company’s million song database, the caller would receive the identification results via SMS for a small fee on their phone bill (Wang 2006).
By 2005, Shazam also offered users the ability to purchase ringtones, mp3s, CDs, and concert tickets, in partnership with companies like AT&T, Amazon and Ticketmaster (Productmint 2021). But it was only with the mass take-up of smartphones, following the launch of Apple’s iPhone in 2007, that Shazam exploded. The company launched its first app on iOS and Android in 2008. By 2011, it was the fourth most downloaded ever on the App Store. By 2014, it had been downloaded 500 million times across all operating systems and was being used to identify around 20 million songs a day (Nguyen 2014). Across this period, the company’s business model also changed considerably. User-pays was finally dropped in 2011, as Shazam relied more and more on advertising and other revenue streams, including referral fees from the major download and streaming services, integration into other products like Siri and TikTok, and growing music metadata markets. In 2013, for instance, Shazam released ‘an interactive map overlaid with its search data,’ allowing record companies, radio stations, and agents to zoom in on cities around the world and look up ‘the most Shazam’d songs in São Paulo, Mumbai, or New York.’ ‘Sometimes we can see when a song is going to break out months before most people have even heard of it,’ the company’s former chief technologist explained. ‘We know where a song’s popularity starts, and we can watch it spread’ (Thompson 2014). A company that started out selling song identification to listeners was now selling listening habits to the music industry.
In 2018, Shazam was acquired by Apple for US$400 million, a long way shy of a US$1 billion estimate a few years before (Productmint 2021). But its significance far outstrips its market value. All the big tech behemoths now do music recognition, as do many other smaller companies and organisations. Since 20??, Google’s Now Playing has been set to ‘on’ by default, so that many phones now ‘Shazam’ 24/7. And years before voice assistants and smart speakers took speech recognition truly mainstream, Shazam introduced millions to the possibility that computational systems could identify sounds other than speech. In doing so, it provided many with a precedent and a language for approaching machine listening more generally. When researchers and journalists describe some new technology as ‘Shazam for bats’ (Gallacher et al 2021), ‘Shazam for birds’ (Douglas 2017), or ‘Shazam for earthquakes’ (Than 2015) – whether or not it is computationally analogous – that is testament to its metaphorical power. Indeed, in a notable reversal of the ‘rule’ that machine listening follows vision, references to ‘Shazam for artworks’ (Smartify), ‘Shazam for fashion’ (Lykdat), or ‘Shazam for plants’ (Plantsnap) are just as common. Shazam has become a dominant metaphor for computational identification as such.