ASJ Continuous Speech Corpus for Research (ASJ-JIPDEC)

Data DOI

Editing:: Speech Database Committee, Acoustical Society of Japan; Intelligent Speech Processing Research Committee, Japan Information Processing Development Center
Publication:: AI Fuzzy Promotion Center, Japan Information Processing Development Center

Vols. 4-6: Read speech of transcribed text of played dialogues (16 sets)

The texts were obtained by eliminating interjections and erroneous expressions from the original transcriptions of played dialogues between two speakers. Then, each dialogue text was read by one speaker. Two versions are available:
- Deletions and particle ellipses were not complemented and inverted expressions were not corrected: 4 dialogues.
- Deletions and particle ellipses were complemented and inverted expressions were corrected to provide polite expressions: 12 dialogues including the above 4 dialogues.

Vol. 7: Played dialogues (37 dialogues)

Played dialogues on various guide such as geographic or tourist guide, including 7 dialogues from which the data in Vols. 4-6 were derived.

Vols. 1-3: 64 speakers (30 males and 34 females).

One speaker reads 3 sets. All speakers read the set A. 12 speakers (6 males and 6 females) read the sets B - J.

Vols. 4-6: 36 speakers (18 males and 18 females).

One speaker reads 5 through 9 sets. Ones set is read by 10–18 speakers.
8 speakers (4 males and 4 females) among the 36 are the same as the speakers in Volumes 1-3.

Vol. 7: 37 speakers (29 males and 8 females).

RAW format (16 kHz, 16 bit (partly 12 bit), Mono, BigEndian)

1 CD-ROM for each volume

For research purpose only

500 yen per volume, 3500 yen for a set of 7 volumes

and service charge including postage 1000 yen

(plus consumption tax for a domestic order)

This corpus was distributed by Japan Information Processing Development Center (JIPDEC) until Feb., 2007.

Vols. 1-3: Read speech of phonetically balanced sentences
- やるべきことは　やっており　なんら　落ち度は　ない。
Vols. 4-6: Read speech of transcribed text of played dialogues

k001. investigator:
今度の音声研究会を聞きに行きたいんですけれども、どう行ったらいいんでしょうか。

k002. respondent:
会場はどちらですか。

k003. investigator:
機械振興会館というらしいんですが、場所を全然知らないんです。

k004. respondent:
機械振興会館なら、東京タワーの前です。
Vol. 7: Played dialogues [SPEECH]

respondent:
いらっしゃいませ。

investigator:
先日書類をお送りした野田と申しますけれども。

respondent:
はい。［えーと］［えーと］野田、 (ど)どちら｛えー｝様でしょうか。

investigator:
野田、秀樹と｛はい｝いいます。

respondent:
わかりました。