NTCIR Project
NTCIR-4 QAC2
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]

NTCIR-4 QAC2 (Q&A data Test Collection)

The collection consists of
　　　　* Document data(the Mainichi News Paper articles 1998-1999 and the Yomiuri News Paper articles 1998-1999)
　　　　* Q&A data set(There are 650 original queries in Japanese and a translated data in English) and answers
　　　　* a scoring tool
Researchers can get the date set which includes the Q&A data set and a scoring tool from NII.

Collection	Task	Documents						Task data
		Genre	Filename	Lang	Year	# of docs	Size	Topic/ Relevanc		judge
								Lang.	#
NTCIR-4 QA	QAC2	News Paper articles	ntc4-j01-mai98.txt	J	1998	about120000	about135Mb	J	queries	2 (3)
			ntc4-j01-mai99.txt		1999	about110000	about143Mb
			ntc4-j01-yomi98.txt		1998	about130000	about183Mb
			ntc4-j01-yomi99.txt		1999	about240000	about312Mb

＊ archers can get the date set which includes the Q&A data set and a scoring tool from NII.But you should obatain the document data as follow.

Two kinds of document data should be used as follow: Please refer to the page" How to obtain Newspaper Article Data"

Japanese news articles published in Japan in the years of 1998-1999. It contains the document records extracted from Yomiuri Newspaper Japanese Article Data. It is available for research purpose use from Nihon Database Kaihatsu Co. Ltd. (currently detailed information is available in Japanese at http://www.ndk.co.jp/yomiuri/index.html) and the document records in the Data shall be converted into the NTCIR standard record format by the script yomi2ntcir.pl.
Japanese news articles published in Japan in the years of 1998-1999. It contains the document records extracted from Mainichi Newspaper Full-Text Article Database CD-ROMs (currently information is available in Japanese only). It is available for research purpose use from Mainichi Newspaper Co. and the document records in the CD-ROMs shall be converted into the NTCIR standard record format by the script mai2.pl, or You can also obtain mai2sgml from http://lr-www.pi.titech.ac.jp/tsc/tsctools/index-jp.html(We thank Dr. Sekine for making this program and allowing us to use it.).

Task

Task definition of QAC-2 is based on the task definition of QAC-1(We have three kinds of QAC tasks). An overview of task description will be presented as follows:

Task 2 (List type) uses the different set of questions. In QAC-1, the same set of questions is used for Task 1 and Task 2.
In Task 3, there are several follow-up questions. In QAC-1, there is only one follow-up question.
Target Documents will increase.
There will be ellipsis in tail expression of question sentences.
Answer expression will be a part of a document.
Document ID will be required as support information for each question.
We are planning to propose the other task as trial ones. The detail will be released later.

Question File Format

The Question File consists of lines with the following format.

[QID]: "[QUESTION]"<CR>

[QID] has a form of [QuestionSetID]-[QuestionNo]-[SubQuestionNo].
[QuestionSetID] consists of four alphanumeric characters.
[QuestionNo] and [SubQuestionNo] consists of four and two numeric characters, respectively.
[QUESTION] is a series of two byte characters. "、" and "。" are used for punctuation marks.
"？" is not used.

Answer File Format

The Answer File consists of lines with the following format (so called CSV format).

[QID](, "[Answer]", [ArticleID], [HTFlag], [Offset])*<CR>
where (...)* is Kleene star, and specifies zero or more occurrences of the enclosed expression.
“The part of article used for deriving the answer” in the above explanation is typically the portion of the articles where your system extracted the answer from. It does not mean that systems should extract the answer from articles. If your system does not use such extraction for deriving answers, please give us the most relevant position to judge the correctness of your answer.
If you can't specify that anyway, you may omit [HTFlag] and [Offset]. For each question, the quad-gram of "[Answer]", [ArticleID], [HTFlag], and [Offset] is repeated more than zero times.
In subtask1, the order of this quad-grams represents the order of the confidence. That is, the most confident answer candidate should be placed first. The number of candidates is up to five in the dry run.
In subtask2 and subtask3, as the answer is a set, the elements of the answer are listed in an arbitrary order.
In the answer file, the line beginning with "#" is a comment. You may include any information, such as a support or context of your answer, as comments.

* for more details, please see QAC Home Page.

The test collection has been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collection included in the test collection were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them .

The followings are the procedures to obtain this QAC3 test collection. The test collection and data available from NII are free of charge.

NTCIR-4 QAC2 Task data are downloadable from NII/IDR at;
http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html

Reference

The terms of use [PDF]
README[txt]

Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Mailing List

The release of the new test collections and correction information shall be announced through the ntcir Mailing list

[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR DATA Home]
Updated on : 2015-07-22

ntc-admin

NTCIR Project NTCIR-4 QAC2 Research Purpose Use of Test Collection

NTCIR-4 QAC2 (Q&A data Test Collection)

Question File Format

Answer File Format

NTCIR Project
NTCIR-4 QAC2
Research Purpose Use of Test Collection