[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
This test collection can be used to evaluate the quality of the answer
on the CQA site.
This test collection consists of the following data.
- 1500 questions extracted from Yahoo Chiebukuro data version 1.0
- Assessment results by four assessors
- ID lists, best answer lists, and category information, etc.
For more details, please refer to README or overview papers (Part I & Part II) in the test collection.
Collection | Task | Documents | task data | |||||||||
Genre | Filename | Lang. | Year | # of docs | Size | Filename | Lang. | # | Relevance judge. | |||
NTCIR-8 CQA | answer quality ranking | QA site on Web | Yahoo!Chiebukuro Data | J | Apr. 2004 to Oct. 2005 |
Questions resolved: 3,116,009 items | about 916MB | NTCIR-8 CQA Test Collection | J |
Questions 1500 items |
2 graded (question), 4 graded (answer) |
|
Answers 7443 items | Best answers 1500 items | |||||||||||
Best answers: 3,116,008 items | about 935MB | |||||||||||
Other answers: 10,361,777 items | about 2.3GB | Normal answers 5943 items |
--- The entire collection is provided by IDR Group, NII for research purposes.
Filename | How to obtain | |
Documents | Yahoo!Chiebukuro Data |
Yahoo!Chiebukuro Data is distributed to researchers from IDR Group, the National Institute of
Informatics. |
Task Data | NTCIR-8 CQA | NTCIR-8 CQA Test Collection is distributed to researchers from IDR Group, the National
Institute of Informatics. This Test Collection is available only for users who have obtained permission to use Yahoo!Chiebukuro Data. For the procedures to obtain the dataset, please refer to http://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html |
The Yahoo Chiebukuro data is used as the document set.
Please see the site of IDR Group about the Yahoo Chiebukuro data.
The task data consists of 1500 questions and 7443 answers.
Four assessors are evaluating the quality of the question to 2 degree(A/B) and
the quality of the answer to 3 degree(A/B/C).
Therefore, the highest quality answer and the lowest quality answer are
written respectively as AAAA and CCCC.
In the CQA task, the 3 degree evaluation patterns were mapped into 4 relevance levels, L3 (highly relevant), L2 (relevant), L1 (partially relevant) and L0 (not relevant), as shown in Table 2 in overview paper part II.
NTCIR-8 CQA Test Collection is provided by IDR Group, NII. The test collection and data available from NII are free of charge.
- How to obtain 'Document Data': Yahoo!Chiebukuro Data
- How to obtain 'Task Data': NTCIR-8 CQA
The Task Data: NTCIR-8 CQA must be used with the Document Data:Yahoo!Chiebukuro Data. If you have obtain Yahoo! Chiebukuro Data and will apply for the Task Data, please contact IDR Office, National Institute of Informatics at Email: idr
Reference
Task Overview of NTCIR 8 CQA
Overview of the NTCIR-8 Community QA Pilot
Task (Part I): The Test Collection and the Task
Overview of the NTCIR-8 Community QA Pilot
Task (Part II): System Evaluation
(1) Inquiries related to Data Application Procedures should be directed to the IDR secretariat.
IDR Group, National Institutes of Informatics
Email: idr
Phone: +81-3-4212-2503
Address: 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, JAPAN
(Please use e-mail for communicating with us as long as possible if not otherwise specified.)
(2)Technical inquiries related to the Test Collection (Data format, How to use the Test Collection) should be directed to the NTCIR admin.
NTCIR Project Group, National Institutes of Informatics
Email: ntc-admin
Phone: +81-3-4212-2529 Fax: +81-3-3556-2751