29. Japanese Multi-speaker Audiobook Corpus (J-MAC)

Data DOI

https://doi.org/10.32130/src.J-MAC

Producer, Project

Assist. Prof. Shinnosuke Takamichi, The University of Tokyo

Contents

This corpus consists of time-mapped texts for commercial audiobooks.

The total of 74 audiobooks (24 novels) were selected from a large number of commercial ones for speech synthesis research.

*NOTE* Audio data is not included in this corpus, and users must purchase the audio data.

Speaker

39 professional speakers

Speech file format

(Speech files are not included)

Distribution media

1 CD

Licensing

For research purpose only

Price

No fee

Further information

https://sites.google.com/site/shinnosuketakamichi/research-topics/j-mac_corpus

Sample data

「セロ弾きのゴーシュ」(作・宮沢賢治):

chapt000:
  parag000:
    style000:
    - sent: ゴーシュは町の活動写真館でセロを[弾|ひ]く[係|かか]りでした。
      time:
      - 3.18
      - 14.105
    - sent: けれどもあんまり[上手|じょうず]でないという評判でした。
      time:
      - 14.13
      - 18.05

Go to corpora list