NTCIR Project
NTCIR-7 Patent Mining
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]


NTCIR-7 Patent Mining

Test Collection
The task's goal is the classification of research papers written in either Japanese or English in terms of the International Patent Classification (IPC) system, which is a global standard hierarchical patent classification system. This test collection is intended to evaluate the following four different subtasks.

The document collection includes unexamined Japanese patent applications published in 1993-2002, Patent Abstracts of Japan published in 1993-2002, patent grant data published from USPTO in 1993-2002, author abstracts of papers presented at the academic conference hosted by either of 65 academic societies in 1988-1997, additional author abstracts of the academic conference paper database in 1997-1999, and Grant Reports in 1988-1997. The document collection does not include diagrams.
Collection Task Documents Task data
Genre Filename Lang. Year # of docs Size Topic/ Relevance
judge
Lang. #
NTCIR-7 PATMN MINING patent full-text Publication of unexamined patent applications J 1993-2002 3,496,252 94.5GB J

Japanese/
Cross-lingual(E2J)
976


2
sci. abstract ntc1-je JE 1988-1997 339,483 577MB
ntc1-j J 332,918 312MB
ntc1-e E 187,080 218MB
ntc2-j J 1986-1999 400,248 600MB
ntc2-e E 134,978 200MB
patent abstract Patent Abstracts of Japan (paj) E 1993-2002 3,496,252 5,482MB E

English/
Cross-lingual(J2E)
976


2
patent full-text Patent grant data published from USPTO E 1993-2002 1,315,470 52.6 GB
sci. abstract ntc1-je JE 1988-1997 339,483 577MB
ntc1-j J 332,918 312MB
ntc1-e E 187,080 218MB
ntc2-j J 1986-1999 400,248 600MB
ntc2-e E 134,978 200MB

* The entire collection is provided by NII for research purposes.

Publication of
unexamined patent
applications
By sending DVD-ROMs: NTCIR-4 PATENT and NTCIR-5 PATENT , or transferring the data files electronically.
NTCIR-4 PATENT: unexamined Japanese patent application published in 1993-1997
NTCIR-5 PATENT: unexamined Japanese patent application published in 1998-2002
ntc1-je
ntc1-j
ntc1-e
By sending CD-ROM:NTCIR-1Test Collection
ntc2-j
ntc2-e
By sending CD-ROM:NTCIR-2 Test Collection
Patent Abstracts of Japan (paj) By sending DVD-ROM: NTCIR-4/5 PATENT, or transferring the data files electronically.
NTCIR-4/5 PATENT: Patent Abstracts of Japan published in 1993-2002
Patent grant
data published
from USPTO
By sending DVD-ROMs: NTCIR-6 PATENT, or transferring the data files electronically.
NTCIR-6 PATENT: patent grant data published from USPTO in 1993-2002
ntc1-je
ntc1-j
ntc1-e
By sending CD-ROM:NTCIR-1Test Collection
ntc2-j
ntc2-e
By sending CD-ROM:NTCIR-2 Test Collection

Documents, Topics and Questions

Unexamined Japanese patent applications 1993-2002

This document set consists of unexamined Japanese patent applications published in 1993-2002 from the Japanese Patent Office.


Patent Abstracts of Japan 1993-2002

The Patent Abstracts of Japan (PAJ) are translations of the JAPIO Patent Abstracts, which are edited manually on the basis of summaries in source applications.


USPTO patent grant data 1993-2002

This document set consists of patent grant data published in 1993-2002 from the U.S.Patent & Trademark Office (USPTO).


NTCIR-1 CLIR task test collection 1998-1997

This document set consists of author abstracts of papers presented at the academic conference hosted by either of 65 academic societies in 1988-1997.


NTCIR-2 CLIR task test collection 1986-1999

This document set consists of additional author abstracts of the academic conference paper database in 1997-1999, and Grant Reports in 1988-1997.


(1) Japanese Subtask / Cross-lingual Subtask (J2E)

Search Topics

Each search topic is a title and an abstract of a research paper written in Japanese, and the total number of search topics is 978.

Relevance judgment

The 978 topics are divided into two groups: group A, in which highly relevant IPC codes are assigned to 525 topics, and group B, in which relevant IPC codes are assigned to 451topics.

(2) English Subtask / Cross-lingual Subtask (E2J)

Search Topics

Each search topic is a title and an abstract of a research paper written in English, and the total number of search topics is 978.

Relevance judgment

The 978 topics are divided into two groups: group A, in which highly relevant IPC codes are assigned to 525 topics, and group B, in which relevant IPC codes are assigned to 451topics.

The followings are the procedures to obtain the test collection. The test collection and data available from NII are free of charge.

Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Notice

The test collection has been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collection included in the test collection were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them .