🐈

Datasets

Tags
big topic
Contributor
James Parker
Date
March 4, 2024
Folgezettel
8

Datasets have been objects of interest and concern in critical AI studies as they exhibit colonial genealogies of knowledge accumulation, raise questions about the global inequality of data labour, and make possible other datalogical processes such as classification and statistization that reduce social complexity to normative averages. Audio datasets - for speech processing, music information retrieval, or computational auditory scene analysis - raise similar questions about data ownership, labour, surveillance, and the environment. Engaging with datasets critically also presents an opportunity to reflect on Indigenous movements on data sovereignty with a focus on questions of data ownership.

Datasets are an essential part of how machine listening systems that incorporate machine learning are built. The next section of the curriculum asks:

  • what is a dataset?
  • what is the difference between a dataset, a database, and an archive?
  • what are some particular features of machine listening datasets?
  • what are some examples of machine listening datasets?
  • Some audio datasets

Where to go from here

To go into more depth, it can be helpful to examine specific datasets to understand their histories, their development, and their implementation. For example:

AudioSet

Mehak Sawhney

Datasets have been objects of interest and concern in critical AI studies as they exhibit colonial genealogies of knowledge accumulation, raise questions about the global inequality of data labour, and make possible other datalogical processes such as classification and statistization that reduce social complexity to normative averages. Audio datasets - for speech processing, music information retrieval, or computational auditory scene analysis - raise similar questions about data ownership, labour, surveillance, and the environment. Engaging with datasets critically also presents an opportunity to reflect on Indigenous movements on data sovereignty with a focus on questions of data ownership.

Resources

Kate Crawford. “Chapter 3: Data,” Atlas of AI. New Haven: Yale University Press, 2021.

Lilly Irani. “The Cultural Work of Microwork,” New Media & Society vol. 17, no. 5 (2015): 720-739.

Papa Reo project on data sovereignty and speech recognition in the Māori language

Xiaochang Li and Mara Mills “Vocal Features: From Voice Identification to Speech Recognition by Machine,” Technology and Culture, vol. 60, no. 2 (2019): S129-S160.