Frequently Asked Questions
- Application for utilizing the corpora
- How to use the corpora?
- User information etc.
- Others
English | 日本語
For more details, please refer to the "How to obtain corpora" page.
Yes, you can. However, please ask your supervising professor to sign the form.
It is possible basically, but it depends on the corpus or purpose of utilization. Please ask the SRC secretariat for more details.
We will send you the free corpora around one week after receiving your Letter of Pledge if the content is acceptable. But it will take 2 or 3 weeks for the charged corpora. Of course, it depends on the destination or the stock condition.
Please ask us if you are in a hurry.
Some corpora are stored in raw format with extension "raw", etc. not in "wav" format and so you cannot play back the speech with Windows Media Player or the similar application programs. You have to convert the file into "wav" format in order to play back the data with Windows Media Player. Please convert them with speech analysis/playback free software which works on Windows System. Please set the parameters as shown below referring to the Speech File Format in the Detailed Corpus List.
Free speech analysis software is available as shown below (please use them at your own responsibility).
Please refer to the manual of each software for more details and installation method.
We do not accept inquiries on how to use the free software mentioned above.
You can use open source software such as SoX (Sound eXchange) for conversion of data from RAW format into WAV format all at once. Please retrieve the necessary information with the key words such as "sox raw wav". For more details, please download the software and install it at your own responsibility.
The text files in the corpora such as JNAS, CENSREC series, ASJ-JIPDEC are stored in various formats in order to deal with different encoding such as "(filename).SJIS", "(filename).EUC". In order to read them with Windows Memo, copy "(filename).SJIS" and change the file mane into "(file name).txt" so that you can read it with Windows Memo.
There are some corpora which do not have any extension such as PASD corpus; some others have unique extensions such as "(file name).JPN", "(file name).ROM", "(file name).spk". These files are encoded in JIS or EUC codes and it will happen that you can not read them with Windows Memo.
In that case, please try the following way:
Please listen to the sample speech data which can be found in the Detailed Page of each corpus.
The NII-SRC speech corpora can be used for research purpose or research and development purpose depending on the corpus. Please ask the SRC secretariat for more details.
The corpus can be used by those who belong to the same research laboratory or research group as the applicant. If the corpus is to be used by several research laboratories or groups, each group should apply for the corpus separately.
Please inform the SRC secretariat by email if you want to suspend utilizing the corpus; we will show you an actual procedure.
Please inform the SRC secretariat by email when you want to change the registered user. Please also inform us by email when you want to change your registered supervisor.
Please inform the SRC secretariat by email when you have changed your affiliation and/or address.
It varies depending on the content or purpose of usage, the following items should be considered.
Please send the SRC secretariat the following information.
Then, we will send you a form for the details of the corpus. Please fill in the form and send back to SRC. SRC will respond you after studying your application.
Please ask the SRC secretariat if you have any questions.
It will be very helpful if you send the SRC secretariat the following information on the bug.
We will inform the bug to the person who offered the corpus and deal with the bug.
The corpora distributed by NII-SRC are used for the research in science and technology such as speech processing, linguistics, phonetics, medical science, etc. Please refer to "Statistics about SRC corpus".
"Data room" → "Bibliography" shows the list of the papers in which NII-SRC corpora are used. It also shows which corpus was actually used in the research and it will be helpful for you to decide which corpus is suitable for your purpose.