[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
This test collection can be used to evaluate the quality of the answer
on the CQA site.
This test collection consists of the following data.
- 1500 questions extracted from Yahoo Chiebukuro data version 1.0
- Assessment results by four assessors
- ID lists, best answer lists, and category information, etc.
For more details, please refer to README or overview papers (Part I & Part II) in the test collection.
|Genre||Filename||Lang.||Year||# of docs||Size||Filename||Lang.||#||Relevance judge.|
|NTCIR-8 CQA||answer quality ranking||QA site on Web||Yahoo!Chiebukuro Data||J||Apr.
|Questions resolved: 3,116,009 items||about 916MB||NTCIR-8 CQA Test Collection||J||
Questions 1500 items
|2 graded (question),
4 graded (answer)
|Answers 7443 items||Best answers 1500 items|
|Best answers: 3,116,008 items||about 935MB|
|Other answers: 10,361,777 items||about 2.3GB||Normal answers 5943 items|
--- The entire collection is provided by IDR Group, NII for research purposes.
|Filename||How to obtain|
Yahoo!Chiebukuro Data is distributed to researchers from IDR Group, the National Institute of
|Task Data||NTCIR-8 CQA||NTCIR-8 CQA Test Collection is distributed to researchers from IDR Group, the National
Institute of Informatics.
This Test Collection is available only for users who have obtained permission to use Yahoo!Chiebukuro Data.
For the procedures to obtain the dataset, please refer to http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html
The Yahoo Chiebukuro data is used as the document set.
Please see the site of IDR Group about the Yahoo Chiebukuro data.
The task data consists of 1500 questions and 7443 answers.
Four assessors are evaluating the quality of the question to 2 degree(A/B) and the quality of the answer to 3 degree(A/B/C).
Therefore, the highest quality answer and the lowest quality answer are written respectively as AAAA and CCCC.
In the CQA task, the 3 degree evaluation patterns were mapped into 4 relevance levels, L3 (highly relevant), L2 (relevant), L1 (partially relevant) and L0 (not relevant), as shown in Table 2 in overview paper part II.
NTCIR-8 CQA Test Collection is provided by IDR Group, NII. The test collection and data available from NII are free of charge.
- How to obtain 'Document Data': Yahoo!Chiebukuro Data
Task Overview of NTCIR 8 CQA
Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task
Overview of the NTCIR-8 Community QA Pilot Task (Part II): System Evaluation
(1) Inquiries related to Data Application Procedures should be directed to the IDR secretariat.
IDR Group, National Institutes of Informatics
Address: 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN
(Please use e-mail for communicating with us as long as possible if not otherwise specified.)
(2)Technical inquiries related to the Test Collection (Data format, How to use the Test Collection) should be directed to the NTCIR admin.
NTCIR Project Group, National Institutes of Informatics
Phone: +81-3-4212-2529 Fax: +81-3-3556-2751