ASJ Continuous Speech Corpus for Research (ASJ-JIPDEC)
- Speech Database Committee, Acoustical Society of Japan
- Intelligent Speech Processing Research Committee, Japan Information Processing Development Center
- AI Fuzzy Promotion Center, Japan Information Processing Development Center
Vols. 1-3: Read speech of phonetically balanced sentences
Vols. 4-6: Read speech of transcribed text of played dialogues (16 sets)
- The texts were obtained by eliminating interjections and erroneous expressions from the original transcriptions of played dialogues between two speakers. Then, each dialogue text was read by one speaker. Two versions are available:
- Deletions and particle ellipses were not complemented and inverted expressions were not corrected: 4 dialogues.
- Deletions and particle ellipses were complemented and inverted expressions were corrected to provide polite expressions: 12 dialogues including the above 4 dialogues.
Vol. 7: Played dialogues (37 dialogues)
- Played dialogues on various guide such as geographic or tourist guide, including 7 dialogues from which the data in Vols. 4-6 were derived.
Vols. 1-3: 64 speakers (30 males and 34 females).
- One speaker reads 3 sets. All speakers read the set A. 12 speakers (6 males and 6 females) read the sets B - J.
Vols. 4-6: 36 speakers (18 males and 18 females).
- ne speaker reads 5 through 9 sets. Ones set is read by 10–18 speakers.
- 8 speakers (4 males and 4 females) among the 36 are the same as the speakers in Volumes 1-3.
Vol. 7: 37 speakers (29 males and 8 females).
- One speaker participates in 1 - 5 dialogues.
Speech file format
RAW format (16 kHz, 16 bit (partly 12 bit), Mono, BigEndian)
1 CD-ROM for each volume
For research purpose only
540 yen per volume plus service charge including postage 1080 yen.
4860 yen for a set of 7 volumes including consumption tax.
This corpus was distributed by NTT Advanced Technology Corporation (NTT-AT) before April 1, 2006.
Speech sample for test listening
- Vols. 1-3: Read speech of phonetically-balanced sentences
- Vols. 4-6: Read speech of transcribed text of played dialogues
- k001. investigator:
- k002. respondent:
- k003. investigator:
- k004. respondent:
- Vol. 7: Played dialogues
- はい。［えーと］［えーと］野田、 (ど)どちら｛えー｝様でしょうか。