NTCIR Project
NTCIR-10 RITE
(Recognizing Inference in TExt)
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR Data Home]

NTCIR-10 RITE (Recognizing Inference in TExt)

The NTCIR-10 RITE-2 Test Collection is intended to evaluate systems that automatically recognize semantic relations (i.e entailment, paraphrase, and contradiction) between sentences.
The test collection includes:

(1) RITE2_JA_bc-mc-unittest: The Development data and the formal run data for Japanese BC, MC and UnitTest subtasks
(2) RITE2_JA_exam: The Development data and the formal run data for Japanese Entrance Exam subtasks (ExamBC and ExamSearch)
(3) RITE2_CS: The Development data and the formal run data for Simplified Chinese BC, MC and RITE4QA subtasks
(4) RITE2_CT: The Development data and the formal run data for Traditional Chinese BC, MC and RITE4QA subtasks

Collection

File

Subtask

Task Data

Corpus

File Type

Language

Development Data

Test Data
(Formal run data)

# of Text Pairs

Tools for linguistic analysis results

added search results

RITE-1 data

# of Text Pairs

Tools for linguistic analysis results

added search results

gold standard
(relevance judgment)

RITE-1 data

NTCIR-10 RITE

RITE2_JA_bc-mc-unittest
*(A)

Japanese BC

xml

Japanese

611

'KNP'

'MeCab' & CaboCha'

task data used in RITE-1**(B)

610

'KNP'

'MeCab' & 'CaboCha

Label = {Y,N}

task data used in RITE-1
**(B)

Japanese MC

548

Label = {F,B,C,I}

Japanese UnitTest

272

241

Label = {Y,N}

RITE2_JA_exam
***(C)

Japanese Entrance Exam ExamBC

510

'KNP'

'MeCab' & CaboCha'

the Entrance Exam data used in RITE-1

448

'KNP'

'MeCab' & 'CaboCha'

Label = {Y,N}

the Entrance Exam data used in RITE-1

Japanese Entrance Exam ExamSearch

510

'KNP'

'MeCab' & 'CaboCha'

448

'KNP'

'MeCab' & 'CaboCha'

textbook search results

textbook corpus for the Japanese Entrance Exam subtask

Wikipedia search results *(A)

Wikipedia corpus for the Japanese Entrance Exam subtask *(A)

File

Subtask

File Type

Language

# of Text Pairs

additional data

RITE2_CS

Simplified Chinese BC

xml

Simplified Chiniese

814

781

1387

Simplified Chinese MC

814

781

1387

Simplified Chinese RITE4QA

2511

5256

RITE2_CT

Traditional Chinese BC

Traditional Chinese

1321

881

1894

Traditional Chinese MC

1321

881

1894

Traditional Chinese RITE4QA

2511

5256

J: Japanese, E: English, C: Chinese (Cs: simplified Chinese, Ct:traditional Chinese)

	RITE2_CS and RITE2_CT Subtask data are available from NII to non-participants for research purpose use. Please see here.
*(A)	RITE2_JA_bc-mc-unittest Subtask data (except task data used in RITE-1) and Wikipedia Corpus are distributed under the conditions of Creative Commons Attribution-Share-Alike License 3.0 (Unported).) Details can be found at http://creativecommons.org/licenses/by-sa/3.0/. Data is available from here.
**(B)	The data used in RITE-1 BC, MC and RITE4QA Subtask is available from NII to non-participants for research purpose use for NTCIR-9 RITE page (Please see here).
***(C)	Entrance Exam Subtask data (except Wikipedia Corpus and Wikipedia search results) is now available for NTCIR-10 RITE Entrance Exam Subtask participants only. (Permission to use the Data is under negotiation. We will announce when it is available.)

README

description-available-data [PDF]

Format

Dev/Test Gold Standard Data Format

<dataset type="bc">
  <pair label="Y" id="1" >
    <t1>氷河は発達地域によって、山岳地に形成される山岳氷河と、主に南極大陸とグリーンランドの広大な面積を覆う大陸氷河に分けられる。</t1>
    <t2>氷河には、2種類の形態があることが知られている。</t2>
  </pair>
  <pair label="N" id="2" >
  : : : 
</dataset>

Test Data Format

<dataset>
  <pair id="1">
    <t1>昇華はフリーズドライの食品や医薬品などを作る際にも利用される。</t1>
    <t2>医薬品製造に用いられていたフリーズドライの技術は、食品にも用いられる。</t2>
  </pair>
  <pair id="2">
  : : : 
</dataset>

For more details, please refer to the NTCIR-10 RITE-2 overview paper available at the NTCIR online proceedings.

Task Overview of NTCIR-10 Recognizing Inference in Text :
Overview of the NTCIR-10 Recognizing Inference in Text (RITE-2) at NTCIR [PDF]

(1) RITE2_JA_bc-mc-unittest Subtask data (except task data used in RITE-1)

RITE2_JA_bc-mc-unittest Subtask data (except task data used in RITE-1) and Wikipedia Corpus are available at:
http://warehouse.ntcir.nii.ac.jp/openaccess/rite/10RITE-Japanese-wiki.html
For task data used in RITE-1, it is available from NII, please refer at: NTCIR-9 RITE .

(2) RITE2_CS and RITE2_CT Subtask data

How to obtain Document Data;
https://research.nii.ac.jp/ntcir/permission/perm-en-DocumentData.html

Documents to submit

Application Form [txt]
User agreement form (sent by email)

Reference

The terms of use [PDF]
NTCIR-10 RITE Task website

Task Overview of NTCIR-10 Recognizing Inference in Text
Overview of the NTCIR-10 Recognizing Inference in Text (RITE-2) at NTCIR [PDF]

NTCIR-10 Online Proceedings: RITE

Tools

Contact us: idr-ntcir

Notice

The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.

[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR Data Home]

Updated on : 2014-06-16

ntc-admin

NTCIR Project NTCIR-10 RITE (Recognizing Inference in TExt) Research Purpose Use of Test Collection

NTCIR-10 RITE (Recognizing Inference in TExt)

Format

NTCIR Project
NTCIR-10 RITE
(Recognizing Inference in TExt)
Research Purpose Use of Test Collection