MITB Banner

Meet ToyADMOS, A Dataset Of Miniature Machine Operating Sounds

Share

Over the years, researchers have been shedding light on the domain of anomaly detection in sound. The anomalous sounds indicate various symptoms of mistakes or malicious activities in a system and detection of it can possibly prevent several problems. Anomaly detection sound can be used for various purposes such as audio surveillance, product inspection, predictive maintenance and much more.   

Anomalous sound data is difficult to collect, as there were no available large-scale datasets for anomaly detection in machine operating sounds until now. Recently, researchers from NTT Media Intelligence Laboratories and Ritsumeikan University, Tokyo, introduced a new dataset known as ToyADMOS which is mainly designed for training and testing ADMOS systems. 

About The Dataset

ToyADMOS dataset is a machine operating sounds dataset of approximately 540 hours of normal machine operating sounds and over 12,000 samples of anomalous sounds collected with four microphones at a 48kHz sampling rate. 

The dataset consists of three sub-datasets for three types of ADMOS tasks such as machine-condition inspection, fault diagnosis of machines with geometrically fixed tasks, and fault diagnosis of machines with moving tasks. The overview of the three sub-datasets are mentioned below

  • Toy Car: Designed for product-inspection task, toy car runs on an inspection device where the sound data are collected with four microphones arranged close to the inspection device.
  • Toy Conveyor: Designed for fault diagnosis of a fixed machine, toy conveyor is fixed on a desk, and sound data are collected with four microphones in such a way that one is fixed on the body of the conveyor, and the other three are placed on the desk.
  • Toy Train: Designed for fault diagnosis of a moving machine, toy train runs on a railway track where sound data are collected with the help of four microphones surrounding the track.

Here, each sub-dataset consists of three types of sound data, they are normal, anomalous, and environmental. For collecting the sounds, four omnidirectional microphones (SHURE SM11-CN) were used. 

The ToyADMOS dataset has the following characteristics:

  • It is designed for three ADMOS tasks: product inspection (toy car), fault diagnosis for a fixed machine (toy conveyor), and fault diagnosis for a moving machine (toy train).
  • Machine-operating sounds and environmental noise are individually recorded for simulating various noise levels.
  • All sounds are recorded with four microphones for testing noise reduction and/or data-augmentation techniques such as mix-up
  • In each task, multiple machines of the same class are used where each machine belongs to the same class of toys but has a different detailed structure. Since the collected operating sounds have variations depending on individual differences, the dataset can be used for testing domain-adaptation techniques to absorb individual differences and/or changes in noise level.
  • Each anomalous sound was recorded several times for testing a few-shot learning-based ADMOS for obtaining the characteristics of anomalous sounds from only a few samples.
  • The released dataset consists of over 180 hours of normal machine-operating sounds and over 4,000 samples of anomalous sounds collected with four microphones at a 48-kHz sampling rate for each task.

Advantages & Limitations

The main advantage of the ToyADMOS dataset over other datasets is that it was built under controlled conditions. Unlike other datasets, this dataset collects all normal sounds under the same condition in order to analyse the system performance or the cause of misdetection. This dataset will assist in advancing research into anomaly detection in sounds. 

The limitation of the dataset is that the toy sounds and the real machine sounds do not match exactly. The details of the spectral shape of a toy and a real machine often times sounds different even though the time-frequency structure is similar.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.