What's New
Table of Contents
Please carefully make reference to 'Application
Home', and then complete the registration form of NTCIR-5
WEB HERE. Please also visit 'User
Agreement Forms' and follow the instruction.
The NTCIR-5 WEB will use 'NW1000G-04' as the document data, which was mainly crawled from *.jp domain in 2004 and is about 1 terabytes of total page data size. Its subset of 300 gigabytes 'NW300G-04' is also under consideration. Contents and formats of the data will be almost the same as that of 'NW100G-01' used in NTCIR-3/4 WEB. The organizers will prepare four versions of the document data as follows:
RAW, EUC and COOKED are delivered to all the
participants stored in hard disk drives as were in the NTCIR-4 WEB.
SEGMENTED, a newly prepared version in the NTCIT-5 WEB, is estimated to
be so large that some special treatments will be necessary according to
the participants' demands. Therefore, its deliverly will be somewhat
later.
The computer resources in the 'Open
Laboratory' located at
National Institute of Informatics will be available for the
participants within the limits of the existing
resources. The organizers will announce how to apply for the Open
Laboratory.
The WEB Task has been attempting to push ahead, from various viewpoints of actual use of the Web, researches of information access systems for large-scale Web documents that have structures composed of tags and hyper-links since the 3rd NTCIR Workshop. However, because of the organizers' circumstances, the WEB Task at the 5th NTCIR Workshop (NTCIR-5 WEB) focuses on a single main subtask, "Navigational Retrieval Task 2", and takes up only a newly proposed pilot subtask, "Query Term Expansion Task". Other subtasks conducted in the WEB Tasks at the 3rd/4th NTCIR Workshop (NTCIR-3/4 WEB) may possibly be taken up again in the future NTCIR Workshops.
Current task description of each subtask is provided below. The details will be announced on the NTCIR-WEB home pages. We expect active contributions by the workshop participants and requests or advice from the researchers in related research areas to perform the NTCIR-5 WEB and to construct more usable test collections of the Web documents.
Navigational Retrieval Task is one of the
subtasks newly proposed
at the
NTCIR-4 WEB. The 'navigational
retrieval' indicates searches that guide
a user's information seeking process. The NTCIR-4/5 WEB focuses on a
known item
search, a kind of the navigational retrieval.
The known item search is to find
representative Web pages of a given item, but not a given Web
page. A representative Web page may be a site
top page, an entry page to a series of related pages, or a single fully
informative page. Two types
of users' situations are supposed as follows: (i) the case where the
user
requests the typical pages of a known object (e.g., a person, shop, or
facility), and he/she carries out a search using the name of the
object, and (ii) the case where the user knows the requested object but
does not remember the name, so he/she carries out a search using the
attribute information or the related information about the object. In
both of the cases, the number of relevant documents tend to be just one
or a few. Consequently, the subtask can be regarded as including home
page finding and named page finding in TREC Web Track, but not
restricted to them.
Ordinary information retrieval systems
often use document text contents only, while processing and utilizing
anchors, link structures, logical
document units, etc. are deemed to be effective for Web retrieval. The
result of the Navigational Retrieval Task 1 suggests that this tendency
is remarkable in the known item search. Therefore, the organizers
encourage participation with systems applying original methods suitable
for this subtask.
The following is an outline of the task
description. Please refer to the Overview
of the Navigational Retrieval Task 1 in the Working
Notes of the NTCIR-4.
Query Term Expansion Task is a newly
proposed pilot subtask. Its
detailed task definition will be fixed based on discussions among the
organizers and the participants. For more information, please visit the
subtask's
web page.
Each group can participate in either or both of the above mentioned
two subtasks.
DATE
ACTION 2004-08-01
Call for Participation (preliminary) 2004-09-20
Registration Due
* Registrations after this date will be accepted as long as possible.2004-10-01
Document Data Release
* Provided in a few divisions as they are prepared. The first one will be of about 300GB.
2004-12-01
Dry-Run Topics Release 2005-01-01
Dry-Run Results Submission 2005-03-01
Dry-Run Evaluation Results Release 2005-04-15
Formal-Run Topics Release 2005-05-15
Formal-Run Results Submission 2005-08-01
Formal-Run Evaluation Results Release 2005-10-01
Submission Due of Camera-ready Manuscript for the Working Notes
* Working Notes will be delivered at the Workshop Meeting.2005-12-6--9
Workshop Meeting
2006-02-
Submission Due of Camera-ready Manuscript for the Proceedings
* The Proceedings will be published broadly.
[Top]
[Japanese]
[NTCIR-WEB home]
[NTCIR-5
home]
[NTCIR home]