37. Corpus including Audio-Visual, Instructed, Affective Recordings of Empathetic Speech (CAVIARES)
Data DOI
https://doi.org/10.32130/src.CAVIARES
Producer, Project
Lect. Yuki Saito, The University of Tokyo
Contents
This corpus includes both acted dialogues and expressive reading speech, spoken by a single professional female Japanese speaker with facial expressions captured.
- Acted dialogues: 8.3 hours of speech data based on the text of STUDIES corpus.
- Expressive reading speech: 1.2 hours of speech data based on the text of VoiceActress100 and ITA corpus (phoneme balance sentences).
Each utterance is annotated with perceived emotion labels and temporally aligned with dense facial landmark sequences extracted using MediaPipe Face Mesh.
Speaker
One professional female speaker
Recording environment
Studio
File format
Speech: WAV format (48 kHz, 16 bit, Mono)
Video: NumPy(.npy) format (1920x1080, 60fps)
Distribution media
4 DVD
Licensing
For research purpose only
Price
No fee
Further information
https://sython.org/Corpus/CAVIARES/
Sample data for test listening
https://y-saito.sakura.ne.jp/sython/demo/demo_ASJ2025S/demo.html