Papers related to NTCIR (with abstract)

[Japanese] [paper list without abstract] [NTCIR home]

(Last-modified on Oct. 14, 1999)

Publications on NTCIR
Studies using NTCIR-1 (Test version)
Studies using NTCIR-1 (Preliminary version)

Publications on NTCIR

Kageura, K., Koyama, T., Yoshioka, M., Takasu, A., Nozue, T., Tsuji, K.. NACSIS corpus project for ir and terminological research. In Natural Language Processing Pacific Rim Symposium 1997, pp. 493-496, Phuket, Thailand, Dec. 1997.

Abstract:
The paper introduces the corpus construction project currently carried out at NACSIS, which aims at providing a common testbed for terminological and IR research. The motivations and backgrounds, the linguistic specifications as well as some technical problems of the corpus are discribed in the paper.

Kando, N., Koyama, T., Oyama, K., Kageura, K., Yoshioka, M., Nozue, T., Matsumura, A., Kuriyama, K.. NTCIR : NACSIS test collection project. In [Poster] the 20th Annual Colloquium of the British Computer Society Information Retrieval Specialist Group, Autrans, France, Mar. 1998.

Abstract:
In this poster we introduce the project constructing NTCIR (NACSIS Test Collection for evaluation of Information Retrieval systems) currently carried out at the National Center for Science Information Systems (NACSIS), Japan . In the following, the backgrounds and the aims of the project are firstly introduced, followed by the specification of the test collection. We then discuss some technical problems concerning the construction of the collection.

Koyama, T., Yoshioka, M., Kageura, K.. The construction of a lexically motivated corpus --- the problem with defining lexical units. In First International Conference on Language Resources and Evaluation, pp. 1015-1019, Granada, Spain, May 1998.

Abstract:
We are currently constructing a special language corpus, with particular emphasis on such lexically oriented studies and applications as terminology and information retrieval. In the paper, we focus on the problem with defining the lexical units in running texts, which is essential for the construction of a lexically motivated corpus. Although we focus on Japanese in this paper, the problem discussed here may be relevant to any other language.

Kageura, K., Yoshioka, M., Koyama, T., Nozue, T., Tsuji, K.. Towards a common testbed for corpus-based computational terminology. In COMPUTERM'98, pp. 81-85, Montreal, Canada, Aug. 1998.

Abstract:
We are currently constructing a textual and terminological corpus with special emphasis on the possible use in corpus-based study of terminology. In this paper we discuss the basic background, motivation and the aim of the project, startingwith the examination of the current state-of-the-art of the automatic term recognition. The basic desiderata of a common testbed are then clarified, followed by a brief introduction of the basic feature of the NACSIS corpus which is intended to offer a common testbed for terminological research.

Nozue, T., Kando, N., Kuriyama, K.. Reconsideration of the concept of relevance for NACSIS test collection (in Japanese). In Proceeding of the 46th Annual Conference of Japan Society of Library and Information Science, pp. 67-70, Nov. 1998.

Kando, N.. Invited Talk: On evaluation of information retrieval systems: Aspects of test collections and competitions (in Japanese). In Informatics Symposium'99, Jan. 1999.

Kando, N., Kuriyama, K., Nozue, T., Oyama, K.. NTCIR-1(NACSIS Test Collection for Information Retrieval system-1): Its policy and practice (in Japanese). In the Special Interest Group Notes of Information Processing Society of Japan, No. 99-FI-53, pp. 33-40, Mar. 1999.

Abstract:
This paper reports the outline of the NTCIR (NACSIS Test Collection for Information Retrieval systems) project, its Test Collection 1 (NTCIR-1), and the competition-typed workshop that uses the NTCIR-1. Based on the previous discussion about the data used for information retrieval laboratory testing, we discuss the fundamental policies of the project. NTCIR-1 contains ca.330,000 documents, more than half are Japanese-English paired. A search topic contains detailed narrative including term definition, relevance judgment criteria, the purpose of the search, and background knowledge, which are thought to be helpful for relevance assessment. The Workshop, which started from November, 1998, obtained thirty-one participating groups. The exhaustivity of the relevance assessment for the training topics is reported. The pooling by interactive searches was effective for particular topics and found 17.5% of unique relevant documents. The internal pooling worked well and covered 97% of the whole relevant documents.

Yoshioka, M., Okada, M., Kageura, K., Koyama, T.. Construction of terminologically-motivated corpus (in Japanese). In the Special Interest Group Notes of Information Processing Society of Japan, No. 98-FI-53, pp. 41-48, Mar. 1999.

Nozue, T., Kando, N.. Primary considerations in the concept of relevance: Relevance judgement of NTCIR (in Japanese). In the Special Interest Group Notes of Information Processing Society of Japan, No. 99-FI-53, pp. 49-56, Mar. 1999.

Kando, N. (organized). New perspective of IR: From test collection to search engine (in Japanese). Open Panel at the 58th Annual Meeting of the Information Processing Society of Japan, Mar. 1999. [slides(Japanese only)]

Kando, N., Kuriyama, K., Nozue, T.. NACSIS test collection workshop (NTCIR-1). In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, Aug. 1999.

Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S.. Overview of IR tasks at the first NTCIR workshop. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, pp. 11-44, Tokyo, Japan, Aug. 1999.

Studies using NTCIR-1 (Test version)

* Note: The Test version of the NTCIR-1 was compiled prior to the Preliminary version. The number of documents in NTCIR-1 (Test version) is almost same that the NTCIR-1 (Preliminary version) has, but the document size and the collecion size are different.

Kando, N.. Cross linguistic scholarly information transfer and database services in japan. In 1997 Anuual Meeting of the American Society for Information Science, Panel on Multilingual Databases, Washington D.C., USA, Nov. 1997.
Kuriyama, K.. Query Expansion using Thesauri (in Japanese). In the Special Interest Group Notes of Information Processing Society of Japan, No. 98-FI-52, pp. 1-8, Jan. 1998.
Kando, N., Kageura, K., Yoshioka, M., Oyama, K.. Phrase processing methods for japanese text retrieval. In ACM-SIGIR '98 Workshop on "Information Retrieval : Theroy into Practice, Melbourne, Australia, Aug. 1998.

Abstract:
This paper examines the effectiveness of different phrase identification and weighting methods for Japanese text retrieval in an operational information retrieval (IR) system, called NACSIS-IR. Based on our previous experiments, we used character-based indexing with positional information and word- or phrase-based query processing, which allowed us to implement sophisticated linguistic analysis on large-scale databases while maintaining adequate efficiency. The results of retrieval experiments on a large-scale Japanese test collection showed that the combination of enhanced phrase identification using patterns defined over part-of-speech tags and our algorithm Weight5 made a significant positive contribution to retrieval effectiveness. The paper also discusses indexing and phrase processing of Japanese or East Asian languages.
Keywords:
Phrase processing, Japanese text retrieval, character-based indexing, word-based query segmentation
Aizawa, A., Kando, N., Kageura, K.. A graph-based method for automatic generation of multilingual keyword clusters and its applications. In International Joint Digital Library Workshop '98, Bankok, Thailand, Sep. 1998.
Kando, N., Aizawa, A.. Cross-lingual information retrieval using automatically generated multilingual keyword clusters. In the Proceedings of 3rd International Workshop on Information Retireval with Asian Languages, pp. 86-94, Singapore, Singapore, Oct. 1998.

Abstract:
We propose an approach for cross-lingual information retrieval (CLIR) using the automatically generated multilingual keyword clusters based on graph theory, and show its effectiveness in IR and CLIR. The graph theoretic method has advantages that low-frequency keywords can be maintained for later use in IR. The results of retrieval against NACSIS Test Collection 1 showed that query expansion using the clusters improved the search effectiveness in monolingual retrieval, by 13.2%, 14.2% at the level of "Relevant" and "Partially Relevant", respectively. The search effectiveness of CLIR attained levels of 52.4%, 65.7% of the results for monolingual retrieval with "Relevant", and "Partially Relevant", respectively without any manual interaction during the retrieval. Future studies are also discussed.
Keywords:
Cross-lingual information retrieval, English and Japanese, bilingual keywords clusters, graph theory, Japanese text retrieval, character-based indexing, word-, and phrase-based query segmentation
Kando, N., Aizawa, A., Tsuji, K., Kageura, K., Kuriyama, K.. Cross-language information retrieval and automatic construction of multilingual lexicons. In 1998 Anuual Meeting of the American Society for Information Science, Pittbergh, U.S.A, Oct. 1998.

[Top] [NTCIR home]

ntc-admin

Papers related to NTCIR (with abstract)

Publications on NTCIR

Studies using NTCIR-1 (Test version)

Studies using NTCIR-1 (Preliminary version)