[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
The collection includes:
Collection |
Task |
Documents |
Task Data | |||||||
Genre |
Filename |
Lang. |
Year |
# of docs |
Size |
Topic |
Relevance Judge |
|||
Lang. |
# |
Grades |
||||||||
NTCIR-6 QAC |
QAC4 |
News Paper articles |
ntc3-j03-mai98.txt |
J |
1998 |
about120000 |
about135Mb |
J |
Queries |
4 |
ntc3-j03-mai99.txt |
J |
1999 |
about 110000 |
about135Mb |
J |
Queries |
4 |
|||
ntc5-j02-mai00.txt |
J |
2000 |
about110000 |
about135Mb |
J |
Queries |
4 |
|||
ntc5-j01- |
J |
2001 |
about110000 |
about130Mb |
J |
Queries |
4 |
@@
We use four year Japanese newspaper articles published in the years of
1998-2001. It contains the document records extracted from Mainichi Newspaper Full-Text Article Database CD-ROMs. For the researchers who is not participants, Mainichi Newspaper Full-Text Articles Databese CD-ROMs 1998, 1999, 2000 and 2001 are avilavle for research purpose use from
Mainichi Newspaper Co. The document records in the CD-ROMs shall be converted
into the NTCIR standard record format by the script mai2ntcir-r.pl or IREX
and TSC standard record format by the script mai2sgml.pl. The script mai2ntcir-r.pl
is avilable from the URL:
@@ETo obtain script mai2ntc-r.plFhttp://research.nii.ac.jp/ntcir/permission/ntcir-4/script/mai2ntc-r.pl_txt
@@EREADMEymai2ntc-r.plzhttp://research.nii.ac.jp/ntcir/permission/ntcir-4/script/READMEforMainichiScript-r.txt
Task Overview
The Question File consists of lines with the following format.
[QID]: "[QUESTION]"<CR>
[QID], "[Answer]", [ArticleID], [MFlag]<CR>
@(, "[Answer]", [ArticleID], [MFlag]<CR> )*
where (...)* is Kleene star, and specifies zero or more occurrences of the enclosed expression.
@· [QID] is the same as in the question file format above. It must be unique in the file, and ordered identically with in the corresponding question file. It is allowed, however, that some of [QID]s do not list at the file.
@· [Answer] is the answer to the question, and a series of two byte characters.
@· [ArticleID] is the identifier of the article or one of the articles used in the process of deriving the answer. It consists of nine numbers followed with JA-.
@· [MFlag] is "E" or "M". It will be "E" if the answer string is a part of document [ArticleID]. It will be "M" the answer string is modified from extracted answer string from the document [ArticleID].
In the answer file, the line beginning with "#" is a comment.
You may include any information, such as a support or context of your answer,
as comments.
For more details, please see QAC Home Page and NTCIR-6 QAC Task Overview.
The followings are the procedures to obtain this NTCIR-6 QAC test collection. The test collection and data available from NII are free of charge.
Reference
NTCIR Project Office (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
The release of the new test collections and correction information shall be announced through the ntcir Mailing list
The test collection has been constructed and used for the NTCIR. They are
usable only for the research purpose use.
The documents collection included in the test collection were provided
to NII for used in NTCIR free of charge or for a fee. The providers of
the document data kindly understand the importance of the test collection
in the research on information access technologies and then granted the
use of the data for research purpose. Please remember that the document
data in the NTCIR test collection is copyrighted and has commercial value
as data. It is important for our continued reliable and good relationship
with the data producers/providers that we researchers must behave as a
reliable partners and use the data only for research purpose under the
user agreement and use them carefully not to violate any rights for them
.