AudioSet

Tags

objectsreading

Contributor

James Parker

Date

March 4, 2024

Folgezettel

Audioset is a large dataset created by Google that comprises around 2 million 10-second sound clips (extracted from user-uploaded YouTube videos) all labelled by human annotators for use in training neural networks, machine learning models, to recognize and categorize different types of sounds and audio events. This dataset was designed with the goal of creating automatic audio event recognition systems that can label hundreds or thousands of different sound events in real-world recordings with a time resolution better than one second, similar to how human listeners can recognize and relate the sounds they hear.

Readings:

Announcing AudioSet: A Dataset for Audio Event Research. (n.d.). Google AI Blog. Retrieved December 8, 2020, from http://ai.googleblog.com/2017/03/announcing-audioset-dataset-for-audio.html
Ellis, D. (2018, October 4). Recognizing Sound Events. https://jh.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4a7e392c-5163-41a6-8229-aadc01099e63
Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., Plakal, M., & Ritter, M. (2017). Audio set: An ontology and human-labeled dataset for audio events. Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference On, 776–780.

Experiment

Dataset Audit: a score