[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
The collection consists of
@@@@* Document data(the Mainichi News Paper articles 2000-2001 and the Yomiuri
News Paper articles 2000-2001)
@@@@* Q&A data set(There are 360 queries in Japanese, which constitue 50
series) and answers
@@@@* a
scoring tool
Researchers can get the date set which includes the Q&A
data set and a scoring tool from NII.
Collection | Task | Documents | Task data | |||||||
Genre | Filename | Lang | Year | # of docs | Size | Topic/ Relevanc | judge | |||
Lang. | # | |||||||||
NTCIR-5 QA |
QAC3 |
News Paper articles |
ntc5-j01-mai00.txt |
J |
2000 |
about120000 |
about135Mb |
J |
queries |
2 (3) |
ntc5-j01-mai01.txt |
2001 |
about110000 |
about143Mb |
|||||||
ntc5-j01-yomi00.txt |
2000 |
about130000 |
about183Mb |
|||||||
ntc5-j01-yomi01.txt |
2001 |
about240000 |
about312Mb |
archers can get the date set which includes the Q&A data set and
a scoring tool from NII.But you should obatain the document data as follow.
Two kinds of document data should be used as follow: Please refer to the page" How to obtain Newspaper Article Data"
Japanese news articles published in Japan in the years of 2000-2001. It contains the document records extracted from Mainichi Newspaper Full-Text Article Database CD-ROMs and from Yomiuri Newspaper Japanese Article Data. The former is available for research purpose use from Mainichi Newspaper Co. and the document records in the CD-ROMs shall be converted into the NTCIR standard record format by the script mai2.pl.
The later is available for research purpose use from Nihon Database Kaihatsu Co. Ltd. and the document records in the Data shall be converted into the NTCIR standard record format by the script yomi2ntcir.pl.
Task
The task setting follows subtask 3 in NTCIR-4 QAC. That is, participants are requested to answer series of questions in a simulated interactive information gathering for report writing.
For each question, all and only correct answers should be listed as its response.
In each series of questions, the first question is marked and other questions have references to portions of or answers to some of the previous questions.
The participants must not look ahead to the questions following the one currently being handled. This restriction reflects the fact that the task is a simulation of interactive use of QA systems in dialogues.
Answers are names and values extracted from target documents. Those include named entities such as names of persons and organizations, numerical expressions such as monetary and metrological values, titles, dates, and names of species and categories. In addition, event type description in compound nouns and conventional constructions for round numbers and ranges are also included.
The Question File consists of lines with the following format.
[QID]: "[QUESTION]"<CR>
[QID] has a form of [QuestionSetID]-[QuestionNo]-[SubQuestionNo].
[QuestionSetID] consists of four alphanumeric characters.
[QuestionNo] and [SubQuestionNo] consists of four and two numeric characters, respectively.
[QUESTION] is a series of two byte characters.
"A" and "B" are
used for punctuation marks.
"H" is not
used.
The Answer File consists of lines with the following format (so called CSV format).
The followings are the procedures to obtain this NTCIR-5 QAC test collection. The test collection and data available from NII are free of charge.
Reference
Email: ntc-secretariat
The release of the new test collections and correction information shall be announced through the ntcir Mailing list
The test collection has been constructed and used for the NTCIR. They are
usable only for the research purpose use.
The documents collection included in the test collection were provided
to NII for used in NTCIR free of charge or for a fee. The providers of
the document data kindly understand the importance of the test collection
in the research on information access technologies and then granted the
use of the data for research purpose. Please remember that the document
data in the NTCIR test collection is copyrighted and has commercial value
as data. It is important for our continued reliable and good relationship
with the data producers/providers that we researchers must behave as a
reliable partners and use the data only for research purpose under the
user agreement and use them carefully not to violate any rights for them
.