9. IPSJ SIG-SLP Corpora and Environments for Noisy Speech Recognition

9-d. In-car Connected Digit Data and Environment for Noisy Speech Recognition (CENSREC-2)

Data DOI


Producer, Project

Noisy Speech Recognition Evaluation Working Group,

Special Interest Group on Spoken Language Information Processing,

Information Processing Society of Japan (IPSJ)


Common platform for evaluating independently speech recognition accuracy and speech interval detection under noisy environment.

A database for the evaluation of continuous digit recognition in real car driving environments. The digit sequence of each utterance and the pronunciation of Japanese digits are the same as the CENSREC-1 (AURORA-2J) database.


Training data:
73 speakers (33 males, 40 females) 14 687 utterances in total (7492 with close-talk microphone and 7195 with remote microphone).
Test data:
31 speakers (19 males, 12 females) 2964 utterances in total with remote microphone only.

Speech file format

RAW format (16kHz, 16bit, Mono, LittleEndian)

Distribution media



For research and development purposes only


No fee

Speech sample for test listening

Digit strings same as CENSREC-1 recorded in real car driving environments

close-talk microphone remote microphone
Low-speed driving /saN/ /saN/
High-speed driving /saN/ /saN/
Idling /saN/ /saN/

Go to corpora list