Machine listening

James Parker
March 1, 2024


“Machine listening” is one common term for a fast-growing interdisciplinary field of science and engineering which uses audio signal processing and machine learning to “make sense” of sound and speech. Machine listening is what enables you to be “understood” by Siri and Alexa, to Shazam a song, identify a bird call, and to interact with many audio-assistive technologies if you are blind or vision impaired. As early as the 1990s, the term was already being used in computer music to describe the analytic dimension of ’interactive music systems’ whose behavior changes in response to live musical input, though there are precedents even before that. Machine Listening was also, of course, a cornerstone of the mass surveillance programs revealed by Edward Snowden in 2013: SPIRITFIRE’s “speech-to-text keyword search and paired dialogue transcription”; EViTAP’s “automated news monitoring”; VoiceRT’s “ingestion”, according to one NSA slide, of Iraqi voice data into voiceprints. Domestically, machine listening technologies underpin the vast databases of vocal biometrics now held by many prison providers and, for instance, the Australian Tax Office. And they are quickly being integrated into infrastructures of development, security and policing.

As with all forms of machine learning, questions of efficacy, access, privacy, bias, fairness and transparency arise with every use case. But machine listening also demands to be treated as an epistemic and political system in its own right, that increasingly enables, shapes and constrains basic human possibilities, that is making our auditory worlds knowable in new ways, to new institutions, according to new logics, and is remaking (sonic) life in the process.

Machine listening is much more than just a new scientific discipline or vein of technical innovation then. It is also an emergent field of knowledge-power and cultural production, of data extraction and colonialism, of capital accumulation, automation and control. We must make it a field of political contestation and struggle. If there is to be a world of listening machines, we must make it emancipatory


Build a curriculum for mutual study with a community of scholars and activists all working on machine listening and related topics.

Resources (in chronological order)

Robert Rowe (1993) Interactive Music Systems: Machine Listening and Composing (MIT Press)

Dan Ellis (2010) A History and Overview of Machine Listening

Richard Lyon (2017) Human and Machine Hearing: Extracting Meaning from Sound

Xiaochang Li (2017) Divination Engines: A Media History of Text Prediction

George Lewis (2018) Rainbow Family: (Machine) Listening as Improvisation, Technosphere #15

Sean Dockray (2018) ‘Learning from YouTube’, Rivers of Emotion, Bodies of Ore. Oslo: Uten Tittel (Not Yet Titled Press)

Mara Mills and Xiaochang Li (2019) ‘Vocal Features: From Voice Identification to Speech Recognition by Machine’, Technology and Culture, Volume 60(2) DOI: 10.1353/tech.2019.0066

Various artists (2020) (Against) the coming world of listening machines, Liquid Architecture x Unsound

Shannon Mattern (2020), Urban Auscultation, or perceiving matter from the heart, Places

Jonathan Sterne (2022) ‘Is Machine Listening Listening?’ communication +1: Vol. 9: Iss. 1, Article 10

James E K Parker & Sean Dockray (2023) ‘All possible sounds’: speech, music, and the emergence of machine listening, Sound Studies, 9:2, 253-281 DOI: 10.1080/20551940.2023.2195057

Wikipedia entry on Computer Audition (ongoing)