[JAPANESE] [NTCIR Home] [NTCIR Data Home]
The NTCIR-10 RITE-2 Test Collection is intended to evaluate systems that automatically recognize semantic relations (i.e entailment, paraphrase, and contradiction) between sentences.
The test collection includes:
Collection | File | Subtask | Task Data | Corpus | |||||||||||||
File Type | Language | Development Data | Test Data (Formal run data) |
||||||||||||||
# of Text Pairs | Tools for linguistic analysis results | added search results | RITE-1 data | # of Text Pairs | Tools for linguistic analysis results | added search results | gold standard (relevance judgment) |
RITE-1 data | |||||||||
NTCIR-10 RITE | RITE2_JA_bc-mc-unittest *(A) |
Japanese BC | xml | Japanese | 611 | 'KNP' | 'MeCab' & CaboCha' | - | task data used in RITE-1**(B) | 610 | 'KNP' | 'MeCab' & 'CaboCha | - | Label = {Y,N} | task data used in RITE-1 **(B) |
- | |
Japanese MC | 548 | 548 | Label = {F,B,C,I} | ||||||||||||||
Japanese UnitTest | 272 | 241 | Label = {Y,N} | ||||||||||||||
RITE2_JA_exam ***(C) |
Japanese Entrance Exam ExamBC | 510 | 'KNP' | 'MeCab' & CaboCha' | - | the Entrance Exam data used in RITE-1 | 448 | 'KNP' | 'MeCab' & 'CaboCha' | - | Label = {Y,N} | the Entrance Exam data used in RITE-1 | - | ||||
Japanese Entrance Exam ExamSearch | 510 | 'KNP' | 'MeCab' & 'CaboCha' | - | 448 | 'KNP' | 'MeCab' & 'CaboCha' | - | - | ||||||||
- | - | textbook search results | - | - | textbook search results | textbook corpus for the Japanese Entrance Exam subtask | |||||||||||
Wikipedia search results *(A) | Wikipedia search results *(A) | Wikipedia corpus for the Japanese Entrance Exam subtask *(A) | |||||||||||||||
File | Subtask | File Type | Language | # of Text Pairs | - | # of Text Pairs | additional data | - | |||||||||
RITE2_CS | Simplified Chinese BC | xml | Simplified Chiniese | 814 | - | 781 | 1387 | - | |||||||||
Simplified Chinese MC | 814 | 781 | 1387 | ||||||||||||||
Simplified Chinese RITE4QA | - | 2511 | 5256 | ||||||||||||||
RITE2_CT | Traditional Chinese BC | Traditional Chinese | 1321 | 881 | 1894 | ||||||||||||
Traditional Chinese MC | 1321 | 881 | 1894 | ||||||||||||||
Traditional Chinese RITE4QA | - | 2511 | 5256 |
J: Japanese, E: English, C: Chinese (Cs: simplified Chinese, Ct:traditional Chinese)
RITE2_CS and RITE2_CT Subtask data are available from NII to non-participants for research purpose use. Please see here. | |
*(A) | RITE2_JA_bc-mc-unittest Subtask data (except task data used in RITE-1)
and Wikipedia Corpus are distributed under the conditions of Creative Commons
Attribution-Share-Alike License 3.0 (Unported).) Details can be found at
http://creativecommons.org/licenses/by-sa/3.0/. Data is available from here. |
**(B) | The data used in RITE-1 BC, MC and RITE4QA Subtask is available from NII to non-participants for research purpose use for NTCIR-9 RITE page (Please see here). |
***(C) | Entrance Exam Subtask data (except Wikipedia Corpus and Wikipedia search results) is now available for NTCIR-10 RITE Entrance Exam Subtask participants only. (Permission to use the Data is under negotiation. We will announce when it is available.) |
README
<dataset type="bc">
<pair label="Y" id="1" >
<t1>氷河は発達地域によって、山岳地に形成される山岳氷河と、主に南極大陸とグリーンランドの広大な面積を覆う大陸氷河に分けられる。</t1>
<t2>氷河には、2種類の形態があることが知られている。</t2>
</pair>
<pair label="N" id="2" >
: : :
</dataset>
<dataset>
<pair id="1">
<t1>昇華はフリーズドライの食品や医薬品などを作る際にも利用される。</t1>
<t2>医薬品製造に用いられていたフリーズドライの技術は、食品にも用いられる。</t2>
</pair>
<pair id="2">
: : :
</dataset>
For more details, please refer to the NTCIR-10 RITE-2 overview paper available at the NTCIR online proceedings.
(1) RITE2_JA_bc-mc-unittest Subtask data (except task data used in RITE-1)
(2) RITE2_CS and RITE2_CT Subtask data
The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.
- The application form of the test collection must be filled out and sent by E-mail to ntc-secretariat.
- User Agreement (memorandum on Permission to Use Test Collection) is required.
- The user agreement form must be filled out and sent by postal mail or courier to the address below.
Please download and make two copies of the form (double-sided).
Signatures are needed on both agreement forms.
After being counter-signed by the NII side, one copy of the form will be sent to you and one copy will be kept by NII.
- Application Form [txt]
- User agreement form (sent by email)
Reference
- The terms of use [PDF]
- NTCIR-10 RITE Task website
- Task Overview of NTCIR-10 Recognizing Inference in Text
Overview of the NTCIR-10 Recognizing Inference in Text (RITE-2) at NTCIR [PDF]- NTCIR-10 Online Proceedings: RITE
- Tools
Address
NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
Notice
The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available
to NII for use in the NTCIR project free of charge or for a fee. The providers
of the document data understand the importance of such test collections
in research on information access technologies and have kindly given their
permission to use the data for research purposes. Please remember that
the document data in the NTCIR test collection is copyrighted and has commercial
value as data. To maintain a good relationship with the data producers/provider,
we researchers must be reliable partners and use the data only for research
purposes under the user agreement, and we must use the data carefully so
as not to violate copyright.