[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
This test collection is intended to evaluate machine translation (MT) quality
from Japanese and English targeting patent information.
The collection includes:
Training Data and Test Data used in NTCIR-8 PATMT Evaluation Subtask.
(This Data consists of Japenese sources, English reference translations,
English machine translations and human evaluation data.)
Collection | Subtask | Phase | Task Data | # | Lang. |
NTCIR-8 PATMT | AE* | Training |
Source Data |
100 sentences (Publication of unexamined patent applications) |
J |
Reference Translation Data |
100 sentences (Patent grant data published from USPTO) |
E | |||
Machine Translation Data | 100 sentences * 11 systems |
E | |||
Human Evaluation Data (adequacy) | 100 sentences * 11 systems * 3 raters |
- | |||
Human Evaluation Data (fluency) | 100 sentences * 11systems * 3 raters |
- | |||
Test |
Source Data |
100 sentences (Publication of unexamined patent applications) |
J | ||
Reference Translation Data |
100 sentences (Patent grant data published from USPTO) |
E | |||
Machine Translation Data | 100 sentences * 12 systems |
E | |||
Human Evaluation Data (adequacy) | 100 sentences * 12 systems * 3 raters |
- | |||
Human Evaluation Data (fluency) | 100 sentences * 12 systems * 3 raters |
- | |||
Additional | Additional Reference Translation Data | 100 sentences * 3 translators |
E |
AE* Evaluation Subtask
--- The entire collection is provided by NII for research purposes.
The followings are the procedures to obtain the test collection. The test collection and data available from NII are free of charge.
NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
The test collection has been constructed and used for the NTCIR. They are
usable only for the research purpose use.
The documents collection included in the test collection were provided
to NII for used in NTCIR free of charge or for a fee. The providers of
the document data kindly understand the importance of the test collection
in the research on information access technologies and then granted the
use of the data for research purpose. Please remember that the document
data in the NTCIR test collection is copyrighted and has commercial value
as data. It is important for our continued reliable and good relationship
with the data producers/providers that we researchers must behave as a
reliable partners and use the data only for research purpose under the
user agreement and use them carefully not to violate any rights for them
.