NTCIR Workshop 2
Evaluation of Chinese & Japanese Text Retrieval and Text Summarization
May 2000 - March 2001

conducted by:
National Institute of Informatics (NII, formerly NACSIS), Japan
Co-sponsored by: Japan Society for the Promotion of Science (JSPS)
In cooperation with: Information Processing Society Japan (IPSJ), National Taiwan University,
IPSJ-SIG/FI (Fundamental Infology)

Enquiries: ntc-admin

Last modified: October 30, 2000
An evaluation workshop in Chinese and Japanese text retrieval and text summarization will be held from May 2000 to February 2001. Participation is invited from anyone interested in Chinese and/or Japanese text retrieval and English-Chinese and English-Japanese cross-lingual information retrieval from large-scale collections and text summarization of Japanese texts.


o To encourage research in information retrieval, cross-lingual information retrieval and text summarization by providing reusable test collections.
o To provide a forum for research groups interested in comparing results and exchanging ideas or opinions in an informal atmosphere.
o To improve the quality of the test collections based on the feedback from participants.

o CHINESE IR TASKS (Chinese and English-Chinese IR) Detailed information added!
o The training set and the testing set of Chinese Text Retrieval Tasks are selected from the Chinese Information Retrieval Benchmark 1 (CIRB-1).
o The CIRB-1 consists of three parts: 1) Document Set; 2) Topic Set; and 3) Relevance Judgment.
o Now, the Document Set contains 132,173 news articles from 5 news agencies in Taiwan, the Topic Set contains 50 topics in a form of user's information need from briefs to details, and the Relevance Judgment consists of the related documents to the various topics.
o The details of Chinese IR task could be referred to the The related information is also in the same web site:
o JAPANESE & ENGLISH IR TASKS (Japanese, English and English-Japanese IR)
o Training set: NTCIR-1 CD, more than 330,000 author abstracts of conference papers; more than half are Japanese-English paired (document alignments); alignments are known and usable for training;
o Test set: NTCIR-1 and NTCIR-2.
NTCIR-2 (preliminary version) consists of two document subfiles;
(1) ca.300,000 extended summaries of the Grant-in-Aid research reports ; about 25% are Japanese-English paired.
(2) ca. 100,000 author abstracts of conference papers ; more than half are Japanese-English paired; the alignments are not announced before result submission
o Segmented Japanese texts are available for both Japanese documents and topics NTCIR-1 & 2; use of this data in NOT mandatory.
o We use newpaper articles (Mainichi Shinbun, The Mainichi: you need to pay for a license to use) as original texts for these tasks. They are not limited to business domain, and articles of other domains such as editorials, columns will be included.
o Application was CLOSED
o August 10, 2000: NTCIR-2 CD (new documents and fifty topics) will be distributed to the participants of Japanese & English IR tasks.
o August 31, 2000: CIRB-1-CH CD (132,172 documents and 50 Chinese topics) will be distributed to the participants of Chinese IR Task, and CIRB-1-EN CD (132,172 documents and 50 English topics) will be distributed to the participants of English-Chinese IR Task. (Chinese IR tasks)
o September 18, 2000: Search results submission (Japanese & English IR)
o October 8, 2000: Dryrun (Text Summarization)
o October 20, 2000: Search results submission (Chinese IR)
o November, 2000: Evaluation (Text Summarization)
o January 10, 2001: Results of Relevance Assessments will be distributed to the participants (Chinese IR, Japanese IR)
o February 12, 2001: Papers for the working-note proceedings submission (All Tasks)
o March 7-9, 2001: Workshop meeting at NII, Tokyo, Japan.
o Day 1: Open to public, Days 2-3: Active participants only
o March 16, 2001: Camera-ready copies for the proceedings

Below, is a brief summary of the tasks envisaged for the Workshop. A participant will conduct one or more of the tasks or subtasks below. Participation in only one subtask (for example Japanese monolingual IR (J-J task)) is available:
o Chinese Information Retrieval Task: The Chinese IR Task is to assess the capability of participating systems in retrieving Chinese documents using Chinese queries, The English-Chinese IR Task is to assess the capability of participating systems in retrieving Chinese documents using English queries, Chinese texts, which are composed of characters without explicit word boundary, make the retrieval task more challengeable than English ones. The participating systems can employ any approaches. Either word-based or character-based systems are acceptable. The organizer will not provide any segmentation tools and Chinese dictionaries.
o Japanese & English Information Retrieval Task: Japanese and/or English monolingual IR; cross-lingual IR of single language documents and mixed-language documents of English and Japanese by Japanese and/or English topics; to investigate the search effectiveness of systems that search a static set of documents
o Text Summarization Task: automatic text summarization of Japanese texts; the aims are (1) to collect quarified text data for summarization in Japanese. We will have newspaper articles summarized by hand, and make them available for research purpose use, (2) to evaluate text summarization systems; an extrinsic evaluation, task based evaluation. For details please visit

o A. FULL: Submit retrieval results and describe the system. The correspondence between the group name and the group ID will be also announced.
o B. ANONYMOUS: Submit retrieval results. The details of the system may not be reported. The correspondence between the group name and the group ID is not announced. This category is mainly for the participants from the companies who have troubles to report the details.

The list of the participating groups is made public but the evaluation results will be announced using the group IDs only. Whichever of the types of participation, every participating group must submit (1)a paper for the workshop proceedings, (2)a system description form which describes your system, and (3)bibliographic references and a copy of all your papers using NTCIR test collections.

Online application is available at:
For the text version of application form, please complete and return it via e-mail, fax, or postal mail to;
ATTN: Noriko Kando
NTCIR Project
National Institute of Informatics (NII)
2-1-2 Hitotsubashi, Chiyoda-ku,Tokyo 101-8430, Japan
email: ntc-admin
fax: +81-3-3556-1916 phone: +81-3-4212-2529
Financial support to attend the NTCIR Workshop meeting will be available for the limited number of active oversea participants who will present material at the workshop meeting in February, 2001, and who are not receiving other funding to attend the NTCIR Workshop meeting. Priority will be given to younger researchers. The detail will be announced later.
o Please send email to Noriko Kando, project manager, at kando, or to NTCIR Project administrators (ntc-admin).
o About "Chinese IR Task", please send email to the Task Chairs, Hsin-Hsi Chen
(hh_chen) or Kuang-Hua Chen (khchen).
o About "Text Summarization Task", please send email to the Task Chairs, Manabu Okumura
(oku) or Takahiro Fukushima (fukusima).
o The first day of the Workshop meeting will be open forum of the researchers who are interested in the topics. The second and third days will be open only to the active participating groups that have submited results and selected people from organizing agencies
o The proceedings will be published online as well as printed-form.
o Dissemination of the research results using the NTCIR collections other than in the Workshop's Proceedings is welcome. However, the conditions of participation preclude specific advertising claims based on the results using the Collection or the Workshop.
o International participants are welcome. Announcements will be in English and Japanese, and English for Chinese IR Task.
o The official language for the proceedings papers and presentation at the Workshop meeting in February, 2001 is English.
o For the Japanese & English IR task, because of the copyright issue, the participants are appointed as collaborative researchers in the NII's Collaborative Research Program. There are no additional terms regarding the program except results submission and a paper for the proceedings.
o Information about freely available Japanese morphological analysers will be provided.

o An evaluation of Korean text retrieval is organized separately by Prof Sung Hyon Myaeng, Korea
(shmyaeng), We keep close relationship between each other,.

