Here is a Dataset

Date

2025

Date 1
Status

Here is a Dataset, 2025

image

Radio artwork, stereo, made using audio datasets, custom-built software (Konvolute, by Sean Dockray), Buchla synthesiser, 35 min.

Researched, composed and produced: Machine Listening (Sean Dockray, James Parker, Joel Stern).

Mastered: Joe Talia.

Commissioned: Cashmere Radio/Deutschlandfunk Kultur, 2025.

Datasets are not made for human ears — yet Here is a Dataset (2025) insists they are worth listening to.

Compositionally, some of the material in Here is a Dataset has been arranged ‘by ear’: for its aesthetic qualities, or to point to and draw out some poetic or affective feature that interested us and seemed to escape the logic of mechanisation.

Other passages have been assembled with Konvolute, a custom-built instrument for navigating datasets according to more machinic parameters. MFCCs, or Mel Frequency Cepstral Coefficients, for instance, are widely used in speech recognition and music information retrieval. Konvolute maps higher-dimensional MFCC space into 2D projections, plotting courses through them, letting us hear computers trying to listen like humans. Source Datasets:

Most of the sounds are sampled directly from these five datasets:

  1. Sensing the Forest: solar-powered recordings of a forest in Surrey.
  2. Perceptual Voice Qualities Database: vowel sounds and diagnostic sentences for voice disorder research.
  3. OrchideaSOL: single musical notes for computer-assisted orchestration (IRCAM, Paris).
  4. DCASE Synthetic Audio Dataset: synthetic office sounds, for sound event detection system training.
  5. Toronto Emotional Speech Set: emotional speech recordings (anger, fear, happiness, sadness, etc.) for emotion classification.

In addition, cloned voices (often derived from the training data) perform normally inaudible parts of datasets: spreadsheets, feature measurements, Python preprocessing code.

All of this is paired with a Buchla synthesiser, set to respond dynamically to the voltages of other audio material.

MFCC dataset map, produced in Konvolute.
MFCC dataset map, produced in Konvolute.

Notes on Datasets:

A dataset is never just a collection of files. It is:

  • primary data (recordings, measurements)
  • metadata describing how/when it was gathered
  • the code that processes it
  • the papers that cite it
  • the spreadsheets that organise it
  • the communities who interpret and repurpose it

Working with datasets is a kind of archaeology; slicing through layers of context, methodology, and interpretation. Like musique concrète and sampling practices, our work fragments, recombines, and recontextualises, but here the source is not mass media or commercial culture, but the scientific/technological research archive.

Here is a Dataset moves between systematic and intuitive, machinic and affective. Sometimes datasets are presented in sequence, sometimes layered to produce strange collisions, sometimes traversed as if by a machine-imitating-a-human listener.

Audio Files:

Here is a Dataset is available in two versions. Both are multilingual, but one includes narrated moments in English, the other in German.

Here is a Dataset (Website in German)

Here is a Dataset (broadcast master -23 LUFS) - German

Here is a Dataset (streaming master) - English and German

image
image
image
image
image

Presentations:

  • Cashmere Radio/Deutschlandfunk Kultur 2025