NTCIR-8 Meeting Program

NTCIR (NII Test Collection for IR Systems) Project

｜NTCIR｜CONTACT INFORMATION｜NII｜

NTCIR-8 MEETING

CALL FOR PARTICIPATION
PROGRAM
INVITED TALKS
LANG. RESOURCES
EVIA 2010
ONLINE PROCEEDINGS
REGISTRATION
PAPER SUBMISSION FOR PROCEEDINGS
DEMO & POSTER INSTRUCTION
SPEAKERS INSTRUCTION
IMPORTANT DATES
TRAVEL INFO
- Accommodation
- TRAVEL SUPPORT
- VISA
CONTACT INFO
PEOPLE
SPONSORS

NTCIR-8 HOME
NTCIR HOME

NTCIR-8 Meeting
Language Resource Exhibition
[Japanese]

1

title Advanced Language Information Forum (ALAGIN)

organization Advanced Language Information Forum (ALAGIN)

The Advanced Language Information Forum (ALAGIN, http://www.alagin.jp) was established in 2009 for promoting research and development on the field of speech/language resources, and specifically for facilitating the collaboration between industry, government, and academia in these research domains. Currently, ALAGIN involves about 70 companies and 100 persons from academia. In this exhibition, we show the catalog of the speech/language resources distributed in ALAGIN, future plan of speech/language resource development, and related activities including seminars on speech/language technologies.

2

title Language Grid - Connecting World's Language Services to Support Intercultural Collaboration

organization Kyoto University - Department of Social Informatics - Ishida &Matsubara Laboratory

Language Grid is an online multilingual service platform which enables easy registration and sharing of language services such as online dictionaries, bilingual corpora, and machine translations. Unlike existing machine translation systems, the Language Grid allows users to register and combine user-created dictionaries and bilingual corpora with existing machine translations to realize user-oriented translation programs with greater accuracy.
The Language Grid is developed by the National Institute of Information and Communications Technology (NICT), and its source code is now open and available at http://langrid.nict.go.jp/.

3

title Language Resource Association (GSK)

organization Language Resource Association (GSK)

The Gengo-Shigen-Kyokai (GSK) (literally “Language Resource Association”) was established for promoting the dissemination and distribution of language resources primarily focusing on activities within Japan. GSK collects, manages, distributes language resources (text and speech data, lexica, terminology, and software tools for speech and language processing) useful for research, education and industry. GSK contributes to the development of various research, including natural language processing and speech processing, by collecting and distributing language resources otherwise difficult to develop individually.

4

title Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese

organization Department of Corpus Studies and Center for Corpus Development, National Institute for Japanese Language and Linguistics

Compilation of the Balanced Corpus of Contemporary Written Japanese (or BCCWJ) is underway at the National Institute for Japanese Language and Linguistics. The BCCWJ is the first balanced corpus of the present-day Japanese. After presenting the design and implementation issues of the corpus, results of preliminary analyses about the linguistic characteristics of the texts involved in the BCCWJ is presented, with special attention to the characteristic of blog texts.

5

title Informatics Research Data Repository, NII (NII-IDR)

organization Informatics Research Data Repository, NII (NII-IDR)

IDR gathers and presents information on the following data sets, which are provided by respective sections and projects of NII. We introduce these data sets and show some details on Yahoo! Data Set, which is provided via IDR.
・Yahoo! Data Set
・NTCIR Test Collection
・Speech Corpus
・Video Database

6

title Speech Resources Consortium, NII (NII-SRC)

organization Speech Resources Consortium, NII (NII-SRC)

We are distributing the speech corpora constructed by various organizations and projects. The corpus includes not only the speech data but also its transcriptions and tools. Please use these corpora for information access research as one of the language resources.

Last updated: June 01, 2010