2025
Here is a Dataset, 2025
Radio artwork, stereo, made using audio datasets, custom-built software, Buchla synthesiser, 35 min.
Researched, composed and produced: Machine Listening (Sean Dockray, James Parker, Joel Stern).
Mastered: Joe Talia.
Commissioned: Cashmere Radio/Deutschlandfunk Kultur, 2025.
Hörstück mit KI-Trainingsdaten - Here is a dataset
Wie klingt eine traurige Stimme? Wie eine ängstliche? Hörstück aus Trainingsdaten für künstliche Intelligenz.
www.hoerspielundfeature.de
How does a sad voice sound? How is a dataset of sad voices different to a song, or a field recording, or a poem? This piece, which is derived from five different audio datasets all intended to be used in training AI systems, asks: are datasets worth listening to? What are the poetics of our performances for machines? And what can we learn about these machines by listening?
It is a widespread myth that data is mined. Just like you mine gold, coal, oil or diamonds. In this way of thinking, data is another natural resource of our world. But this is not correct: because data is not simply mined somewhere. It’s created. For example, when actresses are asked to read strange texts to train AI to classify emotions. Or when scientists record themselves performing office sounds, in order to train sound event detection systems. Such datasets aren’t meant for human ears. This piece insists they are worth listening to anyway.
Compositionally, some of the material in Here is a Dataset has been arranged ‘by ear’: for its aesthetic qualities, or to point to and draw out some poetic or affective feature that interested us and seemed to escape the logic of mechanisation.
Other passages have been assembled with Konvolute, a custom-built instrument for navigating datasets according to more machinic parameters. MFCCs, or Mel Frequency Cepstral Coefficients, for instance, are widely used in speech recognition and music information retrieval. Konvolute maps higher-dimensional MFCC space into 2D projections, plotting courses through them, letting us hear computers trying to listen like humans. Source Datasets:
Most of the sounds are sampled directly from these five datasets:
- Sensing the Forest: solar-powered recordings of a forest in Surrey.
- Perceptual Voice Qualities Database: vowel sounds and diagnostic sentences for voice disorder research.
- OrchideaSOL: single musical notes for computer-assisted orchestration (IRCAM, Paris).
- DCASE Synthetic Audio Dataset: synthetic office sounds, for sound event detection system training.
- Toronto Emotional Speech Set: emotional speech recordings (anger, fear, happiness, sadness, etc.) for emotion classification.
In addition, cloned voices (often derived from the training data) perform normally inaudible parts of datasets: spreadsheets, feature measurements, Python preprocessing code.
All of this is paired with a Buchla synthesiser, set to respond dynamically to the voltages of other audio material.
Notes on Datasets:
A dataset is never just a collection of files. It is:
- primary data (recordings, measurements)
- metadata describing how/when it was gathered
- the code that processes it
- the papers that cite it
- the spreadsheets that organise it
- the communities who interpret and repurpose it
Working with datasets is a kind of archaeology; slicing through layers of context, methodology, and interpretation. Like musique concrète and sampling practices, our work fragments, recombines, and recontextualises, but here the source is not mass media or commercial culture, but the scientific/technological research archive.
Here is a Dataset moves between systematic and intuitive, machinic and affective. Sometimes datasets are presented in sequence, sometimes layered to produce strange collisions, sometimes traversed as if by a machine-imitating-a-human listener.
Audio Files:
Here is a Dataset is available in two versions. Both are multilingual, but one includes narrated moments in English, the other in German.
Here is a Dataset (Website in German)
Here is a Dataset (broadcast master -23 LUFS) - German
Here is a Dataset (streaming master) - English and German
Presentations:
- Cashmere Radio/Deutschlandfunk Kultur 2025