NTCIR Project
NTCIR-9 PatentMT (Patent Machine Translation)
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]


NTCIR-9 PatentMT (Patent Mahine Translation Test Collection)

Test Collection

The NTCIR-9 Patent Machine Translation Test Collection is intended to evaluate the quality of machine translations (MT) from Chinese to English (C-E), Japanese to English (J-E) and English to Japanese (E-J) targeting patent information. There are three subtasks:

The collection includes:

The document collection includes unexamined Japanese patent applications published in 1993-2005 and patent grant data published by the USPTO (U.S. Patent & Trademark Office) in 1993-2005. The document collection does not include diagrams.

Collection Language pairs Documents Task data
Genre Filename Lang. Year # of docs Size Test Data Reference translation Human judge. Development data Training data
Lang. # Lang. # # Lang. # Lang. #
NTCIR-9 PatentMT C to E patent full-text Patent grant data published by USPTO E 1993-
2005


C 2000
sentences
E 2000
sentences
100
sentences
*
adequacy:
23 runs 
acceptability:
13 runs
*
3 humans
C-E 2000
sentence pairs
C to E
***
ca. 1 million sentence pairs
E to J patent full-text Publication of unexamined patent applications J 1993-
2005


E

2000
sentences

J

2000
sentences

100
sentences
*
adequacy:
17 runs
acceptability:
11 runs
*
3 humans
E-J

2000
sentence pairs

E to J 3,186,284 sentence pairs
patent full-text Patent grant data published by USPTO E 1993-
2005


J to E J 2000
sentences
E 2000
sentences
100
sentences
*
adequacy:
19 runs
acceptability:
14 runs
*
3 humans
E-J 2000
sentence pairs
J to E

*** The continued use of training data for C to E could be applied for by the task participants of NTCIR-9, upon payment of nominal administrative fees and following an execution of an extension agreement. For details, please contact Administrator, Patents (Ms Janice Chong) [] .
Non-participants could obtain access to training data for C to E on a paid basis. They should write to Administrator, Patents (Ms Janice Chong) [] outlining their purpose of use and other relevant details of their requests.

File name Year Method of Provision
Publication of
unexamined
patent applications
published in 1993-1997 NTCIR-4 PATENT: by sending DVD-ROMs or transferring the data files electronically
published in 1998-2002 NTCIR-5 PATENT: by sending DVD-ROMs or transferring the data files electronically
published in 2003-2005 NTCIR-8 PATMT: by transferring the data files electronically
Patent grant data
published by USPTO
published in 1993-2002 NTCIR-6 PATENT: by sending DVD-ROMs
published in 2003-2005 NTCIR-8 PATMT: by transferring the data files electronically

Documents, Topics and Questions

Unexamined Japanese patent applications 1993-2005

This document set consists of unexamined Japanese patent applications published by the Japanese Patent Office in 1993-2005.

USPTO patent grant data 1993-2005

This document set consists of patent grant data published by the U.S. Patent & Trademark Office (USPTO) in 1993-2005.


More details can be found at NTCIR-9 PatentMT Website, NTCIR-9 PatentMT Task Definition and NTCIR-9 PatentMT Task Overview.

The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.

Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Notice

The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.