NTCIR (NII Test Collection for IR Systems) Project Related URL'sContact InformationNII
NTCIR HOME

Search

HOME
About NTCIR
・WorkShop
NTCIR-11
NTCIR-10
NTCIR-9
NTCIR-8
NTCIR-7
NTCIR-6
NTCIR-5
NTCIR-4
NTCIR-3
NTCIR-2
NTCIR-1
Data/Tools
Publications/Online Proceedings
Related URL's
Mailing Lists
FAQ
Contact Information
PrivacyPolicy
NTCIR CMS HOME


OverviewOriginal Proposal

NTCIR Project

Overview

[Japanese]

The NTCIR Workshop is a series of evaluation workshops designed to enhance research in Information Access (IA) technologies including information retrieval, question answering, text summarization, extraction, etc. It was co-sponcered by Japan Society for Promotion of Science (JSPS) as part of JSPS "Research for Future" Program" and National Center for Science Information Systems (NACSIS) since 1997, by JSPS and Research Center for Information Resources at National Institute of Informatics (RCIR/NII,) in FY 2000, and by MEXT Grant-in-Aid for Scientific Research on Priority Areas of "Informatics" (#13224087) and RCIR/NII in and after FY2001.

The aims are;

  1. to encourage research in Information Access technologies by providing large-scale test collections reusable for experiments and a common evaluation infrastructure allowing cross-system comparisons
  2. to provide a forum for research groups interested in cross-system comparison and exchanging research ideas in an informal atmosphere
  3. to investigate evaluation methods of Information Access techniques and methods for constructing a large-scale data set reusable for experiments.

An evaluation workshop usually provides test collections (data sets usable for experiments) and unified evaluation procedures for experiment results. Each participating group conducts research and experiments using the common data provided by the NTCIR organizer with various approaches. The importance of reusable large-scale standard test collections in IA research has been widely recognized and an evaluation workshop is now recognized as a new style of active research project that facilitates research by providing the data and a forum for research idea exchange and technology transfer.

For the First NTCIR Workshop, the process was started from November, 1998, and the Workshop meeting was held on August 30 - September 1, 1999, at KKR Hotel, Tokyo. Twenty-eight groups from six countries conducted the tasks and submitted the results for the first workshop. For the Second Workshop, the process was started from June 2000 and the meeting was held on March 7-9, 2001, NII, Tokyo and forty-six groups from eight countries have registered for the tasks and 36 groups conducted and submitted the results to one or more tasks. The process of the Third NTCIR Workshop started from October 2001 and the meeting was held on October 8-10, 2002, NII, Tokyo and sixty-five groups from nine countries submitted the results.

From the beginning of the NTCIR Project, We have looked at both traditional laboratory-typed IR system testing and evaluation of more challenging technologies. For the laboratory-typed testing, we have placed emphasis on (1) information retrieval (IR) with Japanese or other Asian languages and (2) cross-lingual information retrieval. For the challenging issues, (3) shift from document retrieval to "information" retrieval and technologies to utilizing information in the documents, and (4) investigation for realistic evaluation, including evaluation methods for summarization, multigrade relevance judgments and single-numbered averageable measures for such judgments, evaluation methods suitable for retrieval and processing of particular document-genre and its usage of the user group of the genre and so on.

The test collection constructed, tasks, participants, sponcership for the previous NTCIR workshops are as follows;

Table 1. Tasks, Collections and Participants of the Previous Workshops

Work-
shop
Period Task Test
Collection
#of
Parting
groups
#Countries
of Parti-
cipants
Sponcer
main categogy subtask
1 Nov 1999
- Sept 1999
Ad Hoc IR J-JE NTCIR-1 3 6 JSPS+
NACSIS
18 28
Cross Lingual IR E-J 10 3
Term Recognition Term Extract 9 3
Roll Analysis
2 June 2000
- Mar 2001
Chinese Text
Retrieval
CHIR(C-C) CIRB010 11 36 5 8 JSPS+
RCIR/NII
ECIR(E-C)
Japanese and
English IR
monolingual
IR (J-J, E-E)
NTCIR-2 25 5
CLIR (J-E,E-J,
J-JE,E-JE)
Text
Summarization
Intrinsic-Extract NTCIR-2 SUMM 9 1
Intrinsic-Free
Extrinsic-IR task
3 October 2001
- October 2002
Cross-Lingual
Information
Retrieval
Single Language(C-C, E-E, J-J, K-K) NTCIR-3CLIR 22 65 8 9 MEXT+
RCIR/NII
Bilingual CLIR (x-C, x-J, x-K) 14 4
Multilingual CLIR (x-CEJ) 7 4
Patent Retrieval Cross-Genre Retirieval NTCIR-3 PAT 8 3
Search Question Retrieval, CLIR 6 3
Optional Task 2 1
Question
Answering
5 possible answers NTCIR-3 QAC 17 2
Only One Set of All the Answers 13 1
Series of Questions 6 1
Text
Summarization
Single Document Summarization NTCIR-3 TSC 8 1
Multiple Document Sumamrization 9 1
WEB Survey Retrieval:Topic Retrieval NTCIR-3 WEB 7 2
Survey Retrieval:Search by Document 2 1
Target Retrieval 7 1
Optional Task: Output Classification 0 1
Optional Task: Speach Driven Retrieval 1 1

Table 2.The test collections constructed (and made available) or will be constructed through NTCIR Workshops

Collection Task Document Topic/Summ Research purpose use
type Lang Lang
NTCIR-1 IR Scientific Ja+En Ja Yes
CIRB010 IR Newspaper'98-9 Ch Ch+En (participants only)
NTCIR-2 IR Scientific Ja+En Ja+En Yes
NTCIR-2 SUMM Summarization Newpaper'94-5,98 Ja Ja Yes <*>
NTCIR-2TAO Summarization Newspaper Ja Ja Yes <*>
NTCIR-3 CLIR IR Newspaper '98-99 CHtr+JA+EN CHtr+JA+
EN+KO
Yes <*>,<*2>
Newspaper '94 KO CHtr+JA+
EN+KO
(Participant only)
NTCIR-3 PATENT IR Patent '98-99
+Abstract '95-99
JA (Fulltext)
JA+EN (Abst)
JA+EN+CHtr+
CHsm+KO
Yes
NTCIR-3 QA QA Newspaper '98-99 JA JA(+EN) Yes <*>
NTCIR-3 SUMM Summarization Newspaper '98-99 JA JA Yes <*>
NTCIR-3 WEB IR html multipule
languages <*3>
JA+(EN) Yes
NTCIR-4 CLIR IR Newspaper '98-99 CHtr+JA+KO+EN
NTCIR-4 PATENT IR Patent 1993-2002
+Abstract 1993-2002
JA (Fulltext)
EN (Abst)
NTCIR-4 QA QA Newspaper '98-99(2 types) JA
NTCIR-4 SUMM Summarization Newspapaer '98-99 (2 types) JA
NTCIR-4 WEB IR html multipule
languages <*3>

JA: Japanese, EN: English, CH: Chinese (tr: traditional, sm: simplified), KO: Korean

* Documents are avaible for research purpose use from Nichigai Associates, Co. (for Japanese users) or MAINICHI International, Inc. (for international users).
*2: Chinese Document Collections, CIRB011, CIRB011, CIRB020 are available for participants only. The contents of the CIRB010 and CIRB011 are the same but the format is slightly different.
*3: almost Japanese and English, (some other languages)

The details of the NTCIR-4 Test Collections are available HERE