9. IPSJ SIG-SLP Corpora and Environments for Noisy Speech Recognition

9-d. In-car Connected Digit Data and Environment for Noisy Speech Recognition (CENSREC-2)

Data DOI

https://doi.org/10.32130/src.CENSREC-2

Producer, Project

Noisy Speech Recognition Evaluation Working Group,

Special Interest Group on Spoken Language Information Processing,

Information Processing Society of Japan (IPSJ)

Contents

Common platform for evaluating independently speech recognition accuracy and speech interval detection under noisy environment.

A database for the evaluation of continuous digit recognition in real car driving environments. The digit sequence of each utterance and the pronunciation of Japanese digits are the same as the CENSREC-1 (AURORA-2J) database.

Speaker

Training data:
73 speakers (33 males, 40 females) 14 687 utterances in total (7492 with close-talk microphone and 7195 with remote microphone).
Test data:
31 speakers (19 males, 12 females) 2964 utterances in total with remote microphone only.

Speech file format

RAW format (16kHz, 16bit, Mono, LittleEndian)

Distribution media

1 DVD

Licensing

For research and development purposes only

Price

No fee

Speech sample for test listening

Digit strings same as CENSREC-1 recorded in real car driving environments

close-talk microphone remote microphone
Low-speed driving /saN/ /saN/
High-speed driving /saN/ /saN/
Idling /saN/ /saN/

Go to corpora list