Corpora list

Click here for information about how to obtain corpora.

Click here for bibliography citing SRC corpora (go to the NII-IDR site).

No-fee

  1. Priority Area Project on "Spoken Language" - Grant-in-Aid for Developmental Scientific Research on "Speech Database" Continuous Speech Corpus (PASL-DSR)
  2. University of Tsukuba Multilingual Speech Corpus (UT-ML)
  3. Tohoku University - Matsushita Isolated Word Database (TMW)
  4. GSR(A) "Regional Difference in Spoken Japanese Dialects" Spoken Japanese Dialect Corpus (GSR-JD)
  5. Real World Computing Project (RWCP) Speech Corpora
    1. RWCP Spoken Dialogue Corpus - 1996 edition (RWCP-SP96)
    2. RWCP Spoken Dialogue Corpus - 1997 edition (RWCP-SP97)
    3. RWCP News Speech Corpus (RWCP-SP99)
    4. RWCP Meeting Speech Corpus (RWCP-SP01)
  6. RWCP Real Environment Speech and Acoustic Database (RWCP-SSD)
  7. Priority Area "Spoken Dialogue" Spoken Dialogue Corpus (PASD)
  8. CIAIR Children Voice Speech Corpus (CIAIR-VCV)
  9. IPSJ SIG-SLP Corpora and Environments for Noisy Speech Recognition (CENSREC)
    1. Noisy Speech Recognition Evaluation Environment (CENSREC-1 ⟨AURORA-2J⟩)
    2. Noisy Speech Detection Evaluation Environment (CENSREC-1-C)
    3. Audio-Visual Speech Recognition Evaluation Environment (CENSREC-1-AV)
    4. In-car Connected Digit Data and Environment for Noisy Speech Recognition (CENSREC-2)
    5. In-car Isolated Word Data and Environment for Noisy Speech Recognition (CENSREC-3)
    6. Reverberant Speech Recognition Evaluation Environment (CENSREC-4)
  10. Priority Areas "Advanced Utilization of Multimedia to Promote Higher Education Reform" Speech Database (UME)
    1. English Speech Database Read by Japanese Students (UME-ERJ)
    2. Japanese Speech Database Read by Foreign Students (UME-JRF)
  11. RIKEN Spoken Dialogue Corpus (Word processing task, Japanese) (RIKEN-DLG)
  12. Japanese Map Task Dialogue Corpus
    1. Chiba University Japanese Map Task Dialogue Corpus (MapTask)
    2. Mie University Japanese Map Task Dialogue Corpus (MapTask-Mie)
  13. Utsunomiya University Spoken Dialogue Database for Paralinguistic Information Studies (UUDB)
  14. Japanese Phonetically-balanced Word Speech Database (ETL-WD)
  15. Speech Database of the 1991-1992 Tsuruoka Survey (Tsuruoka91-92)
  16. X-ray Film database for speech research (X-Ray)
  17. Priority Areas "Prosody and Speech Processing" Japanese MULTEXT Prosodic Corpus (MULTEXT-J)
  18. Chinese MULTEXT Corpus (MULTEXT-C)
  19. Keio University Japanese Emotional Speech Database (Keio-ESD)
  20. Vowel Database: Five Japanese Vowels of Males, Females, and Children Along with Relevant Physical Data (JVPD)
  21. Tokyo Institute of Technology Multilingual Speech Corpus (TITML)
    1. Indonesian (TITML-IDN)
    2. Icelandic (TITML-ISL)
  22. AWA Long-Term Recording Speech Corpus (AWA-LTR)
  23. Speech database of Aragusuku Dialect (Aragusuku)
  24. Speech database of Oogami Dialect (Oogami)
  25. Online Gaming Voice Chat Corpus with Emotional Label (OGVC)
  26. Chiba Three-party Conversation Corpus (Chiba3Party)
  27. Kindai University Japanese Isolated Word Database Read by Children (JWC)
  28. Japanese Kamishibai and Audiobook Corpus (J-KAC)
  29. Japanese Multi-speaker Audiobook Corpus (J-MAC)
  30. Japanese Empathetic Dialogue Speech Corpus (STUDIES)
  31. Real-time MRI Articulatory Movement Database - Version 1 (rtMRIDB)
  32. Kobe University Japanese-Chinese Comparative MRI Movies corpus (KUJC-MRI)
  33. Transcription Corpus of First-encounter Conversations by Elderly Women (TDU-Kao)
  34. Corpus of Connecting Nihongo Utterance and Text (Coco-Nut)
  35. Elderly Adults Read Speech Corpus (EARS)
  36. Hiroshima City University Japanese Emotional Speech Corpus (HCUDB)

Fee-based

  1. ASJ Japanese Newspaper Article Sentences Read Speech Corpus (JNAS)
  2. Japanese Newspaper Article Sentences Read Speech Corpus of the Aged (S-JNAS)
  3. ASJ Continuous Speech Corpus for Research (ASJ-JIPDEC)
  4. NTT - Tohoku University Familiarity-controlled Word Lists (FW03)
  5. NTT - Tohoku University Familiarity-controlled Word Lists 2007 (FW07)
  6. NTT Infant Speech Database (INFANT)

Agency

  1. JEIDA Japanese Common Speech Data Corpus (JEIDA-JCSD) (only available in Japanese text)
  2. JEIDA Noise Database (JEIDA-NOISE) (only available in Japanese text)