ASJ Continuous Speech Corpus for Research (ASJ-JIPDEC)
Data DOI
https://doi.org/10.32130/src.ASJ-JIPDEC
Producer, Project
- Editing:
- Speech Database Committee, Acoustical Society of Japan
- Intelligent Speech Processing Research Committee, Japan Information Processing Development Center
- Publication:
- AI Fuzzy Promotion Center, Japan Information Processing Development Center
Contents
Vols. 1-3: Read speech of ATR's 503 phonetically balanced sentences
Vols. 4-6: Read speech of transcribed text of played dialogues (16 sets)
- The texts were obtained by eliminating interjections and erroneous expressions from the original transcriptions of played dialogues between two speakers. Then, each dialogue text was read by one speaker. Two versions are available:
- Deletions and particle ellipses were not complemented and inverted expressions were not corrected: 4 dialogues.
- Deletions and particle ellipses were complemented and inverted expressions were corrected to provide polite expressions: 12 dialogues including the above 4 dialogues.
Vol. 7: Played dialogues (37 dialogues)
- Played dialogues on various guide such as geographic or tourist guide, including 7 dialogues from which the data in Vols. 4-6 were derived.
Speaker
Vols. 1-3: 64 speakers (30 males and 34 females).
- One speaker reads 3 sets. All speakers read the set A. 12 speakers (6 males and 6 females) read the sets B - J.
Vols. 4-6: 36 speakers (18 males and 18 females).
- One speaker reads 5 through 9 sets. Ones set is read by 10–18 speakers.
- 8 speakers (4 males and 4 females) among the 36 are the same as the speakers in Volumes 1-3.
Vol. 7: 37 speakers (29 males and 8 females).
- One speaker participates in 1 - 5 dialogues.
Speech file format
RAW format (16 kHz, 16 bit (partly 12 bit), Mono, BigEndian)
Distribution media
1 CD-ROM for each volume
Licensing
For research purpose only
Price
500 yen per volume, 3500 yen for a set of 7 volumes
and service charge including postage 1000 yen
(plus consumption tax for a domestic order)
Further information
Note
This corpus was distributed by Japan Information Processing Development Center (JIPDEC) until Feb., 2007.
Speech sample for test listening
- Vols. 1-3: Read speech of phonetically balanced sentences
- Vols. 4-6: Read speech of transcribed text of played dialogues
- k001. investigator:
- 今度の音声研究会を聞きに行きたいんですけれども、どう行ったらいいんでしょうか。
- k002. respondent:
- 会場はどちらですか。
- k003. investigator:
- 機械振興会館というらしいんですが、場所を全然知らないんです。
- k004. respondent:
- 機械振興会館なら、東京タワーの前です。
- Vol. 7: Played dialogues [SPEECH]
- respondent:
- いらっしゃいませ。
- investigator:
- 先日書類をお送りした野田と申しますけれども。
- respondent:
- はい。[えーと][えーと]野田、 (ど)どちら{えー}様でしょうか。
- investigator:
- 野田、秀樹と{はい}いいます。
- respondent:
- わかりました。