11. RIKEN Word Processor Operation Dialogue Speech Corpus (RIKEN-DLG)

Producer, Project

Laboratory for Language-based Intelligent Systems, Brain Science Institute, RIKEN

Contents

a. Dialogues of request for document making

b. Question-answer dialogues

Vols. 1-3: Speech data, transcribed text and database with morpheme tags

Vol. 4: Transcribed text and database with morpheme tags*1

Speaker

Vol. 1: Dialogues requesting making documents (9 dialogues and 9 monologues); no more than 2 hours per dialogue.

Vol. 2: Question-answer dialogues 2002-1 (18 dialogues); no more than one hour per dialogue.

Vol. 3: Question-answer dialogues 2002-2 (18 dialogues); no more than one hour per dialogue.

Vol. 4: Question-answer dialogues 2001 (15 dialogues) no more than two hours per dialogue *1

A total of 129 speakers participated in the recording.

Speech file format

RAW format (16 kHz, 16 bit, Stereo, LittleEndian)*2

Distribution media

Vols. 1-3: 1 DVD each

Vol. 4: 1 CD-ROM

Licensing

For research purpose only

Price

No fee

Comments

*1 Vol. 4 does not contain speech data.

*2 A part of monologues in Vol. 1 is recorded with 32 kHz, 16 bit, Mono conditions.

Recording level varies according to the recorded year.

A table of sampling frequencies, number of channels, and recording conditions is contained in Vol. 1.

Note

All documents are written in Japanese.

Speech sample for test listening

Dialogues of request for document making

0238 R: 
でここで画像をどうやって置けばいいのかわからなくて
0239 R: 
えっと
0240 R: 
操作方法がわからなくていろいろと悪戦苦闘
0241 R: 
えしてるわけなんですけれども
0242 R: 
えっとその前にサンタがぼつになったんですね

Go to corpora list