NTCIR Project
NTCIR-8 Patent Mining Task
Research Purpose Use of Test Collection


NTCIR-8 Patent Mining Task

Test Collection
The task's goal is to create technical trend map from research papers and patents. This task consists of the following two subtasks.

(1) Research Paper Classification Subtask

This task aims to classify research papers written in either Japanese or English in terms of the International Patent Classification (IPC) system, which is a global standard hierarchical patent classification system. This test collection is intended to evaluate the following four different tasks.

(2) Technical Trend Map Creation Subtask

This task aims to extract elemental technologies and their effects from research papers and patents. This test collection is intended to evaluate the following two different tasks. The document collection includes unexamined Japanese patent applications published in 1993-2002, Patent Abstracts of Japan published in 1993-2002, patent grant data published from USPTO in 1993-2002, author abstracts of papers presented at the academic conference hosted by either of 65 academic societies in 1988-1997, additional author abstracts of the academic conference paper database in 1997-1999, and Grant Reports in 1988-1997. The document collection does not include diagrams.

Collection Task Documents Task data
Genre Filename Lang. Year # of docs Size Topic/ Relevance
Lang. #
NTCIR-8 PATMN MINING patent full-text Publication of unexamined patent applications J 1993-2002 3,496,252 94.5GB J

(1) Subtask of research paper classification

(2) Subtask of technical trend map creation

sci. abstract ntc1-je JE 1988-1997 339,483 577MB
ntc1-j J 332,918 312MB
ntc1-e E 187,080 218MB
ntc2-j J 1986-1999 400,248 600MB
ntc2-e E 134,978 200MB
patent abstract Patent Abstracts of Japan (paj) E 1993-2002 3,496,252 5,482MB E

(1) Subtask of research paper classification

(2) Subtask of technical trend map creation

patent full-text Patent grant data published from USPTO E 1993-2002 1,315,470 52.6 GB
sci. abstract ntc1-je JE 1988-1997 339,483 577MB
ntc1-j J 332,918 312MB
ntc1-e E 187,080 218MB
ntc2-j J 1986-1999 400,248 600MB
ntc2-e E 134,978 200MB

* The entire collection is provided by NII for research purposes.

File name Year Method of Provision
Publication of
patent applications
published@in 1993-1997 NTCIR-4 PATENT: by sending DVD-ROMs or transferring the data files electronically
published@in 1998-2002 NTCIR-5 PATENT: by sending DVD-ROMs or transferring the data files electronically
Patent Abstracts of Japan (paj) published@in 1993-2002 NTCIR-4/5 PATENT: by sending a DVD-ROM or transferring the data files electronically
Patent grant data
published by USPTO
published@in 1993-2002 NTCIR-6 PATENT: by sending DVD-ROMs
ntc1-je ntc1-j ntc1-e 1988-1997 NTCIR-1 Test Collection: by sending a CD-ROM
ntc2-j ntc2-e 1986-1999 NTCIR-2 Test Collection: by sending a CD-ROM

Documents, Topics and Questions

Unexamined Japanese patent applications 1993-2002

This document set consists of unexamined Japanese patent applications published in 1993-2002 from the Japanese Patent Office.

Patent Abstracts of Japan 1993-2002

The Patent Abstracts of Japan (PAJ) are translations of the JAPIO Patent Abstracts, which are edited manually on the basis of summaries in source applications.

USPTO patent grant data 1993-2002

This document set consists of patent grant data published in 1993-2002 by the U.S.Patent & Trademark Office (USPTO).

NTCIR-1 CLIR task test collection 1998-1997

This document set consists of author abstracts of papers presented at the academic conference hosted by either of 65 academic societies in 1988-1997.

NTCIR-2 CLIR task test collection 1986-1999

This document set consists of additional author abstracts of the academic conference paper database in 1997-1999, and Grant Reports in 1988-1997.

(1) Subtask of Research Paper Classification

(a) Japanese / Cross-lingual (J2E)


Each topic is a title and an abstract of a research paper written in Japanese, and the total number of topics is 644.

(a) English / Cross-lingual (E2J)


Each topic is a title and an abstract of a research paper written in English, and the total number of topics is 644.

(2) Subtask of Technical Trend Map Creation

(a) Japanese


Each topic is a title and an abstract of a research paper written in Japanese, and the total number of topics is 500.
Each topic is a title and an abstract of a patent written in Japanese, and the total number of topics is 500.

(b) English


Each topic is a title and an abstract of a research paper written in English, and the total number of topics is 500.
Each topic is a title and an abstract of a patent written in English, and the total number of topics is 500.

The following is the procedure to obtain the test collection. The test collection and data are available from NII free of charge.


NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat


The test collection was constructed and used for the NTCIR project. It is usable only for research purposes.
The document collection included in the test collection was made available to NII for use in the NTCIR project free of charge or for a fee. The providers of the document data understand the importance of such test collections in research on information access technologies and have kindly given their permission to use the data for research purposes. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. To maintain a good relationship with the data producers/provider, we researchers must be reliable partners and use the data only for research purposes under the user agreement, and we must use the data carefully so as not to violate copyright.

[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR DATA Home]

Updated on : 2010-10-14