21. Tokyo Institute of Technology Multilingual Speech Corpus (TITML)

21-a. Indonesian (TITML-IDN)

Producer, Project

Assoc. Prof. Koichi Shinoda and Prof. Sadaoki Furui, Tokyo Institute of Technology


The Indonesian Phonetically Balanced Speech Corpus was developed for training the acoustic models of an automatic speech recognition system.

This database contains Bahasa Indonesia speech data from 20 Indonesian speakers. Each speaker was asked to read 343 phonetically balanced sentences most of which selected from a text corpus.


20 speakers (11 males and 9 females)

343 file per speaker

Speech file format

WAV format (16 kHz, 16 bit, Mono)

Distribution media



For research purpose only


No fee

Speech sample for test listening

The Indonesian phonetically balanced sentences selected from a text corpus.

maaf saya terlambat datang ke kantor.

pemerintah menggunakan beberapa referensi diantaranya dampak ekonomi setelah serangan teroris di luxor mesir pada bulan november.

Go to corpora list