NTCIR Project
NTCIR-10 RITE
(Recognizing Inference in TExt)
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR Data Home]


NTCIR-10 RITE (Recognizing Inference in TExt)



The NTCIR-10 RITE-2 Test Collection is intended to evaluate systems that automatically recognize semantic relations (i.e entailment, paraphrase, and contradiction) between sentences.
The test collection includes:


Collection   File Subtask Task Data   Corpus   
File Type Language Development Data   Test Data
(Formal run data)  
# of Text Pairs Tools for linguistic analysis results added search results RITE-1 data  # of Text Pairs Tools for linguistic analysis results added search results  gold standard
(relevance judgment)
RITE-1 data 
NTCIR-10 RITE RITE2_JA_bc-mc-unittest
*(A)
Japanese BC xml  Japanese  611 'KNP' 'MeCab' & CaboCha'  -  task data used in RITE-1**(B) 610 'KNP' 'MeCab' & 'CaboCha Label = {Y,N}  task data used in RITE-1
**(B)
Japanese MC 548 548 Label = {F,B,C,I}
Japanese UnitTest 272 241 Label = {Y,N}
RITE2_JA_exam
***(C)
Japanese Entrance Exam ExamBC 510  'KNP' 'MeCab' & CaboCha' - the Entrance Exam data used in RITE-1  448 'KNP' 'MeCab' & 'CaboCha'  - Label = {Y,N} the Entrance Exam data used in RITE-1
Japanese Entrance Exam ExamSearch  510   'KNP' 'MeCab' & 'CaboCha' - 448 'KNP' 'MeCab' & 'CaboCha' -  -
 - textbook search results    - textbook search results   textbook corpus for the Japanese Entrance Exam subtask
Wikipedia search results *(A) Wikipedia search results *(A) Wikipedia corpus for the Japanese Entrance Exam subtask *(A) 
File  Subtask  File Type Language  # of Text Pairs - # of Text Pairs additional data -
RITE2_CS Simplified Chinese BC xml Simplified Chiniese 814 - 781 1387 -             
Simplified Chinese MC 814 781 1387
Simplified Chinese RITE4QA - 2511 5256
RITE2_CT Traditional Chinese BC Traditional Chinese  1321 881 1894
Traditional Chinese MC 1321 881 1894
Traditional Chinese RITE4QA - 2511 5256

J: Japanese, E: English, C: Chinese (Cs: simplified Chinese, Ct:traditional Chinese)


RITE2_CS and RITE2_CT Subtask data are available from NII to non-participants for research purpose use. Please see here.
*(A) RITE2_JA_bc-mc-unittest Subtask data (except task data used in RITE-1) and Wikipedia Corpus are distributed under the conditions of Creative Commons Attribution-Share-Alike License 3.0 (Unported).) Details can be found at http://creativecommons.org/licenses/by-sa/3.0/.

Data is available from here.
**(B) The data used in RITE-1 BC, MC and RITE4QA Subtask is available from NII to non-participants for research purpose use for NTCIR-9 RITE page (Please see here).
***(C) Entrance Exam Subtask data (except Wikipedia Corpus and Wikipedia search results) is now available for NTCIR-10 RITE Entrance Exam Subtask participants only. (Permission to use the Data is under negotiation. We will announce when it is available.)

README

Format

Dev/Test Gold Standard Data Format
<dataset type="bc">
  <pair label="Y" id="1" >
    <t1>氷河は発達地域によって、山岳地に形成される山岳氷河と、主に南極大陸とグリーンランドの広大な面積を覆う大陸氷河に分けられる。</t1>
    <t2>氷河には、2種類の形態があることが知られている。</t2>
  </pair>
  <pair label="N" id="2" >
  : : : 
</dataset>

Test Data Format
<dataset>
  <pair id="1">
    <t1>昇華はフリーズドライの食品や医薬品などを作る際にも利用される。</t1>
    <t2>医薬品製造に用いられていたフリーズドライの技術は、食品にも用いられる。</t2>
  </pair>
  <pair id="2">
  : : : 
</dataset>

For more details, please refer to the NTCIR-10 RITE-2 overview paper available at the NTCIR online proceedings.


 

(1) RITE2_JA_bc-mc-unittest Subtask data (except task data used in RITE-1)


(2) RITE2_CS and RITE2_CT Subtask data

The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.

Reference


Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Notice

The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.


[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR Data Home]
Updated on : 2014-06-16
ntc-admin