NTCIR Project
NTCIR-6 QAC
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]



NTCIR6 QAC4(Q&A data Test Collection)

mƒeƒXƒgƒRƒŒƒNƒVƒ‡ƒ“‚ΜŠT—vn


The collection includes:

Reseachers can get the data set which includes the Task data set and Evaluation results (part of data) from NII.


Collection

Task

Documents

Task Data

Genre

Filename

Lang.

Year

# of docs

Size

Topic

Relevance
Judge
Lang.

#

Grades

NTCIR-6 QAC

QAC4

News Paper articles

ntc3-j03-mai98.txt

J

 1998

about120000 

about135Mb

J

Queries

4

ntc3-j03-mai99.txt

J

 1999

about 110000

about135Mb

J

Queries

4

ntc5-j02-mai00.txt

J

 2000

about110000

 about135Mb

J

 Queries

4

ntc5-j01-
mai01.txt

J

 2001

about110000

 about130Mb

J

Queries

4


Documents,Topics and Questions

@Documents@

We use four year Japanese newspaper articles published in the years of 1998-2001. It contains the document records extracted from Mainichi Newspaper Full-Text Article Database CD-ROMs. For the researchers who is not participants, Mainichi Newspaper Full-Text Articles Databese CD-ROMs 1998, 1999, 2000 and 2001 are avilavle for research purpose use from Mainichi Newspaper Co. The document records in the CD-ROMs shall be converted into the NTCIR standard record format by the script mai2ntcir-r.pl or IREX and TSC standard record format by the script mai2sgml.pl. The script mai2ntcir-r.pl is avilable from the URL:
@@ETo obtain script mai2ntc-r.plFhttp://research.nii.ac.jp/ntcir/permission/ntcir-4/script/mai2ntc-r.pl_txt
@@EREADMEymai2ntc-r.plzhttp://research.nii.ac.jp/ntcir/permission/ntcir-4/script/READMEforMainichiScript-r.txt

@Task Data@


Task Overview

  1. Question will be non-factoid type question such as why-type, definition, question which has answer consists ofmultiple noun phrases.
  2. There will be100 questions which are natural ones, not generated using target documents.
  3. System returns a set of answers for a question.
  4. Participants have to return human made answers for questions.

Question File Format

The Question File consists of lines with the following format.

[QID]: "[QUESTION]"<CR>

Answer File Format

The Answer File consists of lines with the following format (so called CSV format).

[QID], "[Answer]", [ArticleID], [MFlag]<CR>

@(, "[Answer]", [ArticleID], [MFlag]<CR> )*


where (...)* is Kleene star, and specifies zero or more occurrences of the enclosed expression.

@·   [QID] is the same as in the question file format above. It must be unique in the file, and ordered identically with in the corresponding question file. It is allowed, however, that some of [QID]s do not list at the file.

@·   [Answer] is the answer to the question, and a series of two byte characters.

@·   [ArticleID] is the identifier of the article or one of the articles used in the process of deriving the answer. It consists of nine numbers followed with JA-.

@·    [MFlag] is "E" or "M". It will be "E" if the answer string is a part of document [ArticleID]. It will be "M" the answer string is modified from extracted answer string from the document [ArticleID].

In the answer file, the line beginning with "#" is a comment. You may include any information, such as a support or context of your answer, as comments.

For more details, please see QAC Home Page and NTCIR-6 QAC Task Overview.


To obtain the Test Collection

The followings are the procedures to obtain this NTCIR-6 QAC test collection. The test collection and data available from NII are free of charge.
Documents to submit
Application Form [txt]
Formal Application [PDF]


Reference

The terms of use [PDF]
Task Overview of NTCIR 6 QAC
An Overview of NTCIR-6 QAC4

Address

NTCIR Project Office (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Mailing List

The release of the new test collections and correction information shall be announced through the ntcir Mailing list

Notice

The test collection has been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collection included in the test collection were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them .