[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
The NTCIR-9 Patent Machine Translation Test Collection is intended to evaluate the quality of machine translations (MT) from Chinese to English (C-E), Japanese to English (J-E) and English to Japanese (E-J) targeting patent information. There are three subtasks:
The collection includes:
The document collection includes unexamined Japanese patent applications published in 1993-2005 and patent grant data published by the USPTO (U.S. Patent & Trademark Office) in 1993-2005. The document collection does not include diagrams.
Collection | Language pairs | Documents | Task data | |||||||||||||
Genre | Filename | Lang. | Year | # of docs | Size | Test Data | Reference translation | Human judge. | Development data | Training data | ||||||
Lang. | # | Lang. | # | # | Lang. | # | Lang. | # | ||||||||
NTCIR-9 PatentMT | C to E | patent full-text | Patent grant data published by USPTO | E | 1993- 2005 |
C | 2000 sentences |
E | 2000 sentences |
100 sentences * adequacy: 23 runs acceptability: 13 runs * 3 humans |
C-E | 2000 sentence pairs |
C to E *** |
ca. 1 million sentence pairs | ||
E to J | patent full-text | Publication of unexamined patent applications | J | 1993- 2005 |
E |
2000 |
J |
2000 |
100 sentences * adequacy: 17 runs acceptability: 11 runs * 3 humans |
E-J |
2000 |
E to J | 3,186,284 sentence pairs | |||
patent full-text | Patent grant data published by USPTO | E | 1993- 2005 |
|||||||||||||
J to E | J | 2000 sentences |
E | 2000 sentences |
100 sentences * adequacy: 19 runs acceptability: 14 runs * 3 humans |
E-J | 2000 sentence pairs |
J to E |
*** The continued use of training data for C to E could be applied for by the task participants of NTCIR-9, upon payment of nominal administrative fees and following an execution of an extension agreement. For details, please contact Administrator, Patents (Ms Janice
Chong) [] .
Non-participants could obtain access to training data for C to E on a paid basis. They should write to Administrator, Patents (Ms Janice Chong) [] outlining their purpose of use and other relevant details of their requests.
File name | Year | Method of Provision |
Publication of unexamined patent applications |
published in 1993-1997 | NTCIR-4 PATENT: by sending DVD-ROMs or transferring the data files electronically |
published in 1998-2002 | NTCIR-5 PATENT: by sending DVD-ROMs or transferring the data files electronically | |
published in 2003-2005 | NTCIR-8 PATMT: by transferring the data files electronically | |
Patent grant data published by USPTO |
published in 1993-2002 | NTCIR-6 PATENT: by sending DVD-ROMs |
published in 2003-2005 | NTCIR-8 PATMT: by transferring the data files electronically |
USPTO patent grant data 1993-2005
This document set consists of patent grant data published by the U.S. Patent
& Trademark Office (USPTO) in 1993-2005.
More details can be found at NTCIR-9 PatentMT Website, NTCIR-9 PatentMT Task Definition and NTCIR-9 PatentMT Task Overview.
The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.
Address
NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
Notice
The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available
to NII for use in the NTCIR project free of charge or for a fee. The providers
of the document data understand the importance of such test collections
in research on information access technologies and have kindly given their
permission to use the data for research purposes. Please remember that
the document data in the NTCIR test collection is copyrighted and has commercial
value as data. To maintain a good relationship with the data producers/provider,
we researchers must be reliable partners and use the data only for research
purposes under the user agreement, and we must use the data carefully so
as not to violate copyright.