21. Tokyo Institute of Technology Multilingual Speech Corpus (TITML)

21-a. Indonesian (TITML-IDN)

Data DOI

Producer, Project

Assoc. Prof. Koichi Shinoda and Prof. Sadaoki Furui, Tokyo Institute of Technology

The Indonesian Phonetically Balanced Speech Corpus was developed for training the acoustic models of an automatic speech recognition system.

This database contains Bahasa Indonesia speech data from 20 Indonesian speakers. Each speaker was asked to read 343 phonetically balanced sentences most of which selected from a text corpus.

Speaker

20 speakers (11 males and 9 females)

343 file per speaker

Speech file format

WAV format (16 kHz, 16 bit, Mono)

Distribution media

1 DVD

Licensing

For research purpose only

Price

No fee

Speech sample for test listening

The Indonesian phonetically balanced sentences selected from a text corpus.

maaf saya terlambat datang ke kantor.

male / female

pemerintah menggunakan beberapa referensi diantaranya dampak ekonomi setelah serangan teroris di luxor mesir pada bulan november.