8
Datasets have been objects of interest and concern in critical AI studies as they exhibit colonial genealogies of knowledge accumulation, raise questions about the global inequality of data labour, and make possible other datalogical processes such as classification and statistization that reduce social complexity to normative averages. Audio datasets - for speech processing, music information retrieval, or computational auditory scene analysis - raise similar questions about data ownership, labour, surveillance, and the environment. Engaging with datasets critically also presents an opportunity to reflect on Indigenous movements on data sovereignty with a focus on questions of data ownership.
Datasets are an essential part of how machine listening systems that incorporate machine learning are built. The next section of the curriculum asks:
- what is a dataset?
- what is the difference between a dataset, a database, and an archive?
- what are some particular features of machine listening datasets?
- what are some examples of machine listening datasets?
- Some audio datasets
Where to go from here
To go into more depth, it can be helpful to examine specific datasets to understand their histories, their development, and their implementation. For example:
AudioSetMehak Sawhney
Datasets have been objects of interest and concern in critical AI studies as they exhibit colonial genealogies of knowledge accumulation, raise questions about the global inequality of data labour, and make possible other datalogical processes such as classification and statistization that reduce social complexity to normative averages. Audio datasets - for speech processing, music information retrieval, or computational auditory scene analysis - raise similar questions about data ownership, labour, surveillance, and the environment. Engaging with datasets critically also presents an opportunity to reflect on Indigenous movements on data sovereignty with a focus on questions of data ownership.
Resources
Knowing Machines, Critical Dataset Studies Reading List.
Kate Crawford. “Chapter 3: Data,” Atlas of AI. New Haven: Yale University Press, 2021.
Lilly Irani. “The Cultural Work of Microwork,” New Media & Society vol. 17, no. 5 (2015): 720-739.
Papa Reo project on data sovereignty and speech recognition in the Māori language
Xiaochang Li and Mara Mills “Vocal Features: From Voice Identification to Speech Recognition by Machine,” Technology and Culture, vol. 60, no. 2 (2019): S129-S160.