Corpora list
Click here for information about how to obtain corpora.
Click here for bibliography citing SRC corpora (go to the NII-IDR site).
No-fee
- Priority Area Project on "Spoken Language" - Grant-in-Aid for Developmental Scientific Research on "Speech Database" Continuous Speech Corpus (PASL-DSR)
- University of Tsukuba Multilingual Speech Corpus (UT-ML)
- Tohoku University - Matsushita Isolated Word Database (TMW)
- GSR(A) "Regional Difference in Spoken Japanese Dialects" Spoken Japanese Dialect Corpus (GSR-JD)
- Real World Computing Project (RWCP) Speech Corpora
- RWCP Real Environment Speech and Acoustic Database (RWCP-SSD)
- Priority Area "Spoken Dialogue" Spoken Dialogue Corpus (PASD)
- CIAIR Children Voice Speech Corpus (CIAIR-VCV)
- IPSJ SIG-SLP Corpora and Environments for Noisy Speech Recognition (CENSREC)
- Noisy Speech Recognition Evaluation Environment (CENSREC-1 〈AURORA-2J〉)
- Noisy Speech Detection Evaluation Environment (CENSREC-1-C)
- Audio-Visual Speech Recognition Evaluation Environment (CENSREC-1-AV)
- In-car Connected Digit Data and Environment for Noisy Speech Recognition (CENSREC-2)
- In-car Isolated Word Data and Environment for Noisy Speech Recognition (CENSREC-3)
- Reverberant Speech Recognition Evaluation Environment (CENSREC-4)
- Priority Areas "Advanced Utilization of Multimedia to Promote Higher Education Reform" Speech Database (UME)
- RIKEN Spoken Dialogue Corpus (Word processing task, Japanese) (RIKEN-DLG)
- Japanese Map Task Dialogue Corpus
- Utsunomiya University Spoken Dialogue Database for Paralinguistic Information Studies (UUDB)
- Japanese Phonetically-balanced Word Speech Database (ETL-WD)
- Speech Database of the 1991-1992 Tsuruoka Survey (Tsuruoka91-92)
- X-ray Film database for speech research (X-Ray)
- Priority Areas "Prosody and Speech Processing" Japanese MULTEXT Prosodic Corpus (MULTEXT-J)
- Chinese MULTEXT Corpus (MULTEXT-C)
- Keio University Japanese Emotional Speech Database (Keio-ESD)
- Vowel Database: Five Japanese Vowels of Males, Females, and Children Along with Relevant Physical Data (JVPD)
- Tokyo Institute of Technology Multilingual Speech Corpus (TITML)
- AWA Long-Term Recording Speech Corpus (AWA-LTR)
- Speech database of Aragusuku Dialect (Aragusuku)
- Speech database of Oogami Dialect (Oogami)
- Online Gaming Voice Chat Corpus with Emotional Label (OGVC)
- Chiba Three-party Conversation Corpus (Chiba3Party)
- Kindai University Japanese Isolated Word Database Read by Children (JWC)
- Japanese Kamishibai and Audiobook Corpus (J-KAC)
- Japanese Multi-speaker Audiobook Corpus (J-MAC)
- Japanese Empathetic Dialogue Speech Corpus (STUDIES)
- Real-time MRI Articulatory Movement Database - Version 1 (rtMRIDB)
- Kobe University Japanese-Chinese Comparative MRI Movies corpus (KUJC-MRI)
- Transcription Corpus of First-encounter Conversations by Elderly Women (TDU-Kao)
- Corpus of Connecting Nihongo Utterance and Text (Coco-Nut)
- Elderly Adults Read Speech Corpus (EARS)
- Hiroshima City University Japanese Emotional Speech Corpus (HCUDB)
Fee-based
- ASJ Japanese Newspaper Article Sentences Read Speech Corpus (JNAS)
- Japanese Newspaper Article Sentences Read Speech Corpus of the Aged (S-JNAS)
- ASJ Continuous Speech Corpus for Research (ASJ-JIPDEC)
- NTT - Tohoku University Familiarity-controlled Word Lists (FW03)
- NTT - Tohoku University Familiarity-controlled Word Lists 2007 (FW07)
- NTT Infant Speech Database (INFANT)
Agency
- JEIDA Japanese Common Speech Data Corpus (JEIDA-JCSD) (only available in Japanese text)
- JEIDA Noise Database (JEIDA-NOISE) (only available in Japanese text)