Metaphors of machine listening

Tags
Contributor
James Parker
Date
Folgezettel

3d

All language is metaphorical. The question is, what do these metaphors do? When and how do they matter? When it comes to machine listening, we could notice that these two terms - ‘machine’ and ‘listening’ - are already doing a lot of work in framing diverse technical, commercial, scientific, political, cultural, and legal practices. To talk about the application of hidden markov models or some new generation of neural network to speech corpora, for instance, as involving a kind of ‘listening’, and to say that this listening has been undertaken by a ‘machine’, is already doing something. This is precisely why these metaphors are contested, both by the scientists developing these practices and the critics who write about them. There is a difference between machine ‘listening’ and ‘hearing’, and again between ‘machine’ and ‘computer’ ‘audition’.

But these are also not the only metaphors in play. Amazon uses the term ‘wake word’ to describe a particular application of what technicians call ‘word spotting’. What does it mean to think of smart speaker or other listening device as sleeping? Especially when you might equally think of it as being in a constant state of alertness? Amazon deploys a whole army of metaphors, in fact, to describe the various ways of interacting with Alexa: so many, in fact, that it has produced an ‘Alexa skills kit glossary’ for users and programmers. Not every metaphor is included there of course. Amazon also talks about ‘native’ and ‘non-native’ skills, a language that seems to imagine voice interaction as a frontier in the process of colonisation.

Elsewhere, we could wonder about the differences between speech ‘recognition’ and ‘understanding’, the words ‘neural’ and ‘networks’, or the use of this phrase ‘audio fingerprinting’ to describe the various hashing techniques used by Google Content ID and Shazam to identify songs. We could wonder likewise about the term ‘voiceprinting’ to describe practices of automatic identification based on a person’s voice. Or the language of music ‘recommendation’ systems, as opposed to what Nick Seaver prefers to call them: traps.

Experiment

Find some area of machine listening that interests you. Find a text about it. It could be a journal article, a piece in the newspaper, a company website, or industry report. Now read this text for the metaphors it deploys and the work they seem to be doing. Why do these metaphors matter? What alternatives might there be? How would an alternative metaphor help you understand this technology differently?

Resources

Xiaochang Li (2017), Divination Engines: A media History of Text Prediction, especially Chapter 2 on the historical and institutional differences between speech ‘understanding’ and ‘recognition’

Nick Seaver (2019) ‘Captivating algorithms: Recommender systems as traps’, Journal of Material Culture 24(4), 421-436

Seaver - 2019 - Captivating algorithms Recommender systems as tra.pdf123.6KB

Mara Mills and Xiaochang Li (2019) ‘Vocal Features: From Voice Identification to Speech Recognition by Machine’, Technology and Culture, Volume 60(2) DOI: 10.1353/tech.2019.0066

Douglas Kellner (2021), ‘Metaphors of Cyberspace and Digital Technologies’ in Technology and Democracy: Toward A Critical Theory of Digital Technologies, Technopolitics, and Technocapitalism, Springer

Kellner, Metaphors of Cyberspace and Digital Technologies.pdf263.3KB

Jonathan Sterne (2022) ‘Is Machine Listening Listening?’ communication +1: Vol. 9: Iss. 1, Article 10