[Japanese][NTCIR][workshop][register][agreement] Tasks [clir][patent][qac][tsc][web]

Call For Participation:
Web Retrieval Task at the 3rd NTCIR Workshop
(CLOSED)

Last Updated: 2003-01-08

News:


Organization of Web Retrieval Task, NTCIR Workshop 3

Contact Information

Schedule (2001-10-10 updated)

Overview of Web Retrieval Task

The objectives of Web Retrieval Task in NTCIR-3 are 'to research the retrieval of Web documents that have a structure with tags and links'. Task design and evaluation methods are considered from the viewpoint of features of Web retrieval. Meanwhile, the topic format and some evaluation measures are inherited from the conventional matters of past NTCIRs to enable comparison with the results of these.

We have prepared two types of document collections, mainly gathered from the '.jp' domain: one is over 100 GBytes, reflecting reality, and the other is a selected 10 GBytes, to assist participants to handle this task easily. Because the data size is too large to handle easily and some restrictions exist on delivery of the original data, the participants will only be allowed to use the original document collections inside the National Institute of Informatics (NII). Participants will use computer resources in the open laboratory located at NII to perform data processing, e.g. indexing of the original document data, and will then take out the resulting data and perform experiments using these in their own laboratories.

The Web retrieval task is composed of the following subtasks for the two document collections: 100 GBytes and 10 GBytes.

Subtasks in Web Retrieval Task

A. Survey Retrieval (both recall and precision are evenly weighted for evaluation)
A1. Topic Retrieval
A2. Similarity Retrieval
B. Target Retrieval (precision-oriented)
C. Optional Tasks
C1. Search Results Classification
C2. Speech-Driven Retrieval
C2. etc.

A. Survey Retrieval

Survey retrieval is similar to the traditional ad-hoc retrieval for scientific documents or newspapers, where the system performs searches using newly provided topics for a static document set.
The Topics:
Both automatic and interactive systems are welcome. Any IR systems containing manual intervention during the search process are "interactive". All the others are "automatic".
In the case of A1, the topics are described in almost the same format as the past NTCIR workshops. As a mandatory run, automatic system must submit the result of the search using only <DESC> and using only <TITLE>. <DESC> provides a basic description of the user request, and <TITLE> is composed of 1-3 words that represent the essence of the user request. As a non-mandatory run, automatic system is allowed to use any fields of the topics. The participant should report which fields of the topics are used by automatic systems or interactive systems.
In the case of A2, the automatic system submit the result of the search using <TITLE> and <RDOC>, which identifies three relevant documents;. The automatic system is allowed to use only <RDOC> or a part of it.
Results Submission and Evaluation:
The runs will be submitted as the ranked top 1000 documents retrieved for each topic. The document pool composed of the top-ranked search results submitted by each participant are considered to be as relevant document candidates. Human assessors judge the relevance of each document in the pool. They judge the multi-grade relevance: highly relevant, relevant, partially relevant, or irrelevant, which are proposed in the past NTCIRs, or top relevant, which is newly proposed in this task. The relevance judgments will be performed using trec_eval and ' weighted mean average precision' that being considered the relevance grade and the ranking. The page is the basic unit of runs and relevance judgments, however, when they judge the relevance on a page, the pages within one click distance from it can be referred to, only if the ones are included in the relevant document candidates pool. The participant can not submit more than four runs for each sub-task. The participant specify the priority of each run.
Evidential Passages:
'Evidential passages', i.e. parts of each relevant document that provide evidence of relevance judgment, will be submitted since Web pages are various in their length. While the page is the basic unit for evaluation, evidential passages can be used for complementary evaluation. The submission of evidential passages is not mandatory, and we consider the whole page as the evidence if they are not submitted.

B. Target Retrieval

Target retrieval is attempting to evaluate the effectiveness of the retrieval in a case where the user requires just one answer or at most a few (e.g., a fact-type retrieval, or a retrieval of a site top page), where precision should be emphasized.
The Topics:
Automatic systems, which performs language processing for the topic to formulate the query, and interactive systems, in which the user specifies the query by reviewing the topic, are acceptable.
As a mandatory run, automatic system must submit the result of the search using only <DESC> and using only <TITLE>. <DESC> provides a basic description of the user request, and <TITLE> is composed of 1-3 words that represent the essence of the user request. As a non-mandatory run, automatic system allow to use any fields of the topics. The participant should report which fields of the topics are used by automatic systems or interactive systems.
Results Submission and Evaluation:
The runs will be submitted as the ranked top 10 documents retrieved for each topic, having evidential passages attached (not mandatory). The submission of evidential passages is not mandatory. Several evaluation measures will be applied, for example:
B1. TREC Q&A Track-like method: the inversed rank of first-appeared relevant document
B2. Utility: scoring as +1 for the relevant and -1 for the irrelevant, or scoring the relevant according to relevance grades, e.g. +3 for the highly relevant, +2 for relevant, and +1 for partially relevant.
B3. Reliability: scoring as +1 for the relevant and -1 for the irrelevant and incorrect, and 0 for the irrelevant
The participant can not submit more than four runs for each sub-task. The participant specify the priority of each run.

C. Optional Tasks

The participants can freely submit proposals, using the document set used in sub-task A and B, relating to their own research interests. The results are to be presented as a paper/poster in the NTCIR-3 workshop meeting. If the proposal involves several participants, it can be adopted as a sub-task and investigated in detail. `C1. Search results classification' and `C2. Speech-Driven Retrieval' are examples of optional tasks.
C1. Search Results Classification:
This sub-task tries to evaluate highly precise searching and techniques for supporting user-nabigation, in the case when the user submit very short queries.
The participant performs searching using only the lead term in <TITLE> of the topic, classifies the search results into some labeled groups, and then submits the resulting 200 documents. The classification processing can be perform on more than top 200 document retrieved.
For example, in the case using 'Hidetoshi Nakata' who is one of the famous Japanese soccer players as the query, the results are classified into 'sites', 'schedules', 'magazines/TV programs', ' photographs' and 'supporters' dialys'. We do not set the limitation on the number of classes. Hierarchical classification are also acceptable. The label of classes can be machine-like identification codes, e.g. 'cluster A' and 'cluster B', or typical page titles.
For evaluation, we measure in the manner of the following examples, describe the features of the systems, and compare among them through discussions on ML or round-tables.
C2. Speech-Driven Retrieval:
This sub-task is proposed by Dr. Atsushi Fujii, University of Library and Information Science, and Dr. Katunobu Itou, National Institute of Advanced Industril Science and Technology. Details are available at "Page of Speech-Driven Retrieval Task"(Japanese only).
C3. etc.:
The other examples of optional tasks are not limited to mirror sites detection, data compression, comparable pages alignment, pattern discovery and so on.

The Topics

We have surveyed the actual situation of Web retrieval and the information needs using some questionnares at several universities to design the topic format. The topic format is basically inherited the one of past NTCIRs. The usable fields and mandatory fields are varied according to the sub-tasks. Here is two examples of the topics.
<TOPIC>
<NUM>002</NUM>
<TITLE>中田英寿,試合,今後</TITLE>
<DESC>中田英寿の今後の試合予定を知りたい.</DESC>
<NARR>適合文献は,中田英寿の今後の試合予定を示しているもの.
チケット予約とは連動していなくてよい.ファンの個人的なHPなど
でも具体的な日程,場所,時間がわかるのであれば,正解とする.
今後は,ページが作成された時点からみて「今後」.試合の印象記
などは正解ではない.</NARR>
<CONC>中田英寿,サッカー,試合日程,試合,スケジュール</CONC>
<RDOC>ntcweb003983762345,ntcweb000123453874634,ntcweb00023432934</RDOC>
<USER>大学3年,女性</USER>
</TOPIC>
<TOPIC>
<NUM>034</NUM>
<TITLE>エルニーニョ,世界,影響</TITLE>
<DESC>「エルニーニョ」現象とその世界の気象への影響(海水温,
気圧,降雨量などへの影響を含む)について説明している文書を
探したい.</DESC>
<NARR>適合文献は,「エルニーニョ」の影響についての情報を提供する
もの.海と陸上の大気との相互作用は,エルニーニョ現象に関連する
ものならば,関心がある.「エルニーニョ」は,世界の気候に影響を
及ぼすので,特に南太平洋で重要である.</NARR>
<CONC>エルニーニョ,気象,海水温,気圧,降雨量,大気,南太平洋
</CONC>
<RDOC>ntcweb000003425444,ntcweb000232333923,ntcweb000234338778</RDOC>
<USER>中学2年,男性</USER>
</TOPIC>
<DESC> (DESCRIPTION) represents the most fundamental description of the user's information needs. We consider the basic format of the topic as the manner of "(1) of (2)", e.g. "'the play schedule' of 'Nakata'" or "'recipes' of 'healthy cookies'".
Meanwhile, <TITLE> specify 1-3 terms representing the most fundamental subjects, not representing all of the aspects of the user's information needs.
<NARR> (NARRATIVE) gives the details on backgrounds, retrieval purposes, relevance judgments criteria, term definitions and so on.
<CONC> (CONCEPTS) gives the synonyms, related terms or broader terms that are defined by the topic creator.
<RDOC> (RELEVANT DOCUMENTS) gives the identification numbers of three relevant documents.
<USER> (USER ATTRIBUTES) gives the attributes of the topic creator, e.g. the social position and the gender.
We will select the topics, considering the balances on junres, retrieval purposes and so on.

Document Set

The Definition of Document Set and its Distribution

The document sets should be explicitly specified for test collections. We adopted the following method to do so among several possible ones, because the web retrieval task is our first challange and there are many unknown factors. As this method is the same as those of conventional test collections, many well-known techniques can be utilized for identifying relevant document sets and for systems evaluation. It is also important that the effectiveness of the produced test collection can be kept for a long time.

Document Set for the Workshop

Document Collection:
Gathering Domain:
Document Data and Document Set:
Distribution:
Open Lab. Environments:
Data Contents and Format:

Relevance Jedgments

The human assessors judge the relevance on each element of the relevant document candidates pool, which is composed of the top-ranked search results submitted by each participant. Several different assessors judge the relevance for each topic. The retrieved do
cuments by 'search masters' who are well versed in web searching or searchers will be added to the relevant document candidates pool to enhance the coverage of relevant documents.

Notes

NII is constructing a test collection for web retrieval on the basis of the aforementioned thoughts. There exist many subjects under discussion at the time when we describe this article, so that the data or methods in the actual workshop can be adopted in the different manners from the ones described in this article.

Finally, we expect active contributions by the workshop participants and requests or advice from the researcher in related research areas to perform the Web retrieval task in NTCIR-3 and to construct more usable test collections of the Web documents.


[Japanese][NTCIR][workshop][register][agreement] Tasks [clir][patent][qac][tsc][web]