ASJ Continuous Speech Corpus for Research (ASJ-JIPDEC)
- Speech Database Committee, Acoustical Society of Japan
- Intelligent Speech Processing Research Committee, Japan Information Processing Development Center
- AI Fuzzy Promotion Center, Japan Information Processing Development Center
Vols. 1-3: Read speech of ATR's 503 phonetically balanced sentences
Vols. 4-6: Read speech of transcribed text of played dialogues (16 sets)
- The texts were obtained by eliminating interjections and erroneous expressions from the original transcriptions of played dialogues between two speakers. Then, each dialogue text was read by one speaker. Two versions are available:
- Deletions and particle ellipses were not complemented and inverted expressions were not corrected: 4 dialogues.
- Deletions and particle ellipses were complemented and inverted expressions were corrected to provide polite expressions: 12 dialogues including the above 4 dialogues.
Vol. 7: Played dialogues (37 dialogues)
- Played dialogues on various guide such as geographic or tourist guide, including 7 dialogues from which the data in Vols. 4-6 were derived.
Vols. 1-3: 64 speakers (30 males and 34 females).
- One speaker reads 3 sets. All speakers read the set A. 12 speakers (6 males and 6 females) read the sets B - J.
Vols. 4-6: 36 speakers (18 males and 18 females).
- One speaker reads 5 through 9 sets. Ones set is read by 10–18 speakers.
- 8 speakers (4 males and 4 females) among the 36 are the same as the speakers in Volumes 1-3.
Vol. 7: 37 speakers (29 males and 8 females).
- One speaker participates in 1 - 5 dialogues.
Speech file format
RAW format (16 kHz, 16 bit (partly 12 bit), Mono, BigEndian)
1 CD-ROM for each volume
For research purpose only
500 yen per volume, 3500 yen for a set of 7 volumes
and service charge including postage 1000 yen
(plus consumption tax for a domestic order)
This corpus was distributed by Japan Information Processing Development Center (JIPDEC) until Feb., 2007.
Speech sample for test listening
- Vols. 1-3: Read speech of phonetically balanced sentences
- Vols. 4-6: Read speech of transcribed text of played dialogues
- k001. investigator:
- k002. respondent:
- k003. investigator:
- k004. respondent:
- Vol. 7: Played dialogues
- はい。［えーと］［えーと］野田、 (ど)どちら｛えー｝様でしょうか。