NTCIR Project
NTCIR-8 CQA (Community QA)
Research Purpose Use of Test Collection


NTCIR-8 CQA (Community QA Test Collection)

Test Collection

This test collection can be used to evaluate the quality of the answer on the CQA site.
This test collection consists of the following data.

- 1500 questions extracted from Yahoo Chiebukuro data version 1.0
- Assessment results by four assessors
- ID lists, best answer lists, and category information, etc.

For more details, please refer to README or overview papers (Part I & Part II) in the test collection.

Collection Task Documents task data
Genre Filename Lang. Year # of docs Size Filename Lang. # Relevance judge.
NTCIR-8 CQA answer quality ranking QA site on Web Yahoo!Chiebukuro Data J Apr.
to Oct.
Questions resolved: 3,116,009 items about 916MB NTCIR-8 CQA Test Collection J

Questions 1500 items

2 graded (question),
4 graded (answer)
Answers 7443 items Best answers 1500 items
Best answers: 3,116,008 items about 935MB
Other answers: 10,361,777 items about 2.3GB Normal answers 5943 items

--- The entire collection is provided by IDR Group, NII for research purposes.

Filename How to obtain
Documents Yahoo!Chiebukuro Data

Yahoo!Chiebukuro Data is distributed to researchers from IDR Group, the National Institute of Informatics.
For the procedures to obtain the dataset, please refer to http://www.nii.ac.jp/dsc/idr/en/yahoo/yahoo.html

Task Data NTCIR-8 CQA NTCIR-8 CQA Test Collection is distributed to researchers from IDR Group, the National Institute of Informatics.
This Test Collection is available only for users who have obtained permission to use Yahoo!Chiebukuro Data.
For the procedures to obtain the dataset, please refer to http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html

Documents, Topics and Questions

The Yahoo Chiebukuro data is used as the document set.
Please see the site of IDR Group about the Yahoo Chiebukuro data.

The task data consists of 1500 questions and 7443 answers.
Four assessors are evaluating the quality of the question to 2 degree(A/B) and the quality of the answer to 3 degree(A/B/C).
Therefore, the highest quality answer and the lowest quality answer are written respectively as AAAA and CCCC.

In the CQA task, the 3 degree evaluation patterns were mapped into 4 relevance levels, L3 (highly relevant), L2 (relevant), L1 (partially relevant) and L0 (not relevant), as shown in Table 2 in overview paper part II.

NTCIR-8 CQA Test Collection is provided by IDR Group, NII. The test collection and data available from NII are free of charge.


Task Overview of NTCIR 8 CQA
Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task
Overview of the NTCIR-8 Community QA Pilot Task (Part II): System Evaluation


(1) Inquiries related to Data Application Procedures should be directed to the IDR secretariat.

     IDR Group, National Institutes of Informatics
     Email: idr

     Phone: +81-3-4212-2503
    Address: 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN
(Please use e-mail for communicating with us as long as possible if not otherwise specified.)

(2)Technical inquiries related to the Test Collection (Data format, How to use the Test Collection) should be directed to the NTCIR admin.

     NTCIR Project Group, National Institutes of Informatics
     Email: ntc-admin

     Phone: +81-3-4212-2529    Fax: +81-3-3556-2751