29. Japanese Multi-speaker Audiobook Corpus (J-MAC)

Data DOI

https://doi.org/10.32130/src.J-MAC

Producer, Project

Assist. Prof. Shinnosuke Takamichi, The University of Tokyo

Contents

This corpus consists of time-mapped texts for commercial audiobooks.

The total of 74 audiobooks (24 novels) were selected from a large number of commercial ones for speech synthesis research.

*NOTE* Audio data is not included in this corpus, and users must purchase the audio data.

Speaker

39 professional speakers

Speech file format

(Speech files are not included)

Distribution media

1 CD

Licensing

For research purpose only

Price

No fee

Further information

https://sites.google.com/site/shinnosuketakamichi/research-topics/j-mac_corpus

Sample data

「セロ弾きのゴーシュ」(作・宮沢賢治):

chapt000:
  parag016:
    style000:
    - character: narrative
      sent: ゴーシュの畑からとった、半分熟したトマトを、さも重そうに持って来て、ゴーシュの前におろして云いました。
      time: [383.12, 391.795]
      to whom: narrative
    style001:
    - character: 猫
      sent: 「ああくたびれた。
      time: [392.25, 394.395]
      to whom: ゴーシュ
    - character: 猫
      sent: なかなか、[運搬|うんぱん]はひどいやな。」
      time: [394.42, 397.575]
      to whom: ゴーシュ
    style002:
    - character: ゴーシュ
      sent: 「[何|なん]だと」
      time: [397.6, 398.715]
      to whom: 猫
    style003:
    - character: narrative
      sent: ゴーシュがききました。
      time: [398.74, 400.575]
      to whom: narrative

Go to corpora list