NTCIR Project
NTCIR-6 PATENT
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]


NTCIR-6 PATENT (IR Test Collection)

Test Collection
This test collection is intended to evaluate three different techniques (subtasks) related to patent information processing: Japanese retrieval, English retrieval, and classification. In the Japanese and English retrieval subtasks, a claim in a patent application is used as a search topic to search for the patents that can invalidate the demand in the topic patent. However, the Japanese and English retrieval subtasks use data (i.e., search topics and documents) in Japanese and English, respectively. In the classification, the purpose is to categorize patent applications according to the F-term (File Forming Term) classification system. The document collection includes unexamined Japanese patent applications published in 1993-2002, Patent Abstracts of Japan published in 1993-2002, and patent grant data published from USPTO in 1993-2002. The document collection does not include diagrams.

Collection Task Documents Task data
Genre Filename Lang. Year # of docs Size Topic/ Relevance
judge
Lang. #
NTCIR-6 PATENT IR patent full-text Publication of unexamined patent applications J 1993-2002 3,496,252 94.5GB J

Japanese Retrieval
2,908

Classification 21,606


4
patent abstract Patent Abstracts of Japan (paj) E 1993-2002 3,496,252 5,482MB E 1
patent full-text Patent grant data published from USPTO E 1993-2002 1,315,470 52.6 GB E English Retrieval
3,221
3

* The entire collection is provided by NII for research purposes.

Publication of
unexamined patent
applications
By sending DVD-ROMs (NTCIR-4 PATENT and NTCIR-5 PATENT) , or transferring the data files electronically.
Patent Abstracts
of Japan (paj)
By sending DVD-ROM (NTCIR-4/5 PATENT) , or transferring the data files electronically.
Patent grant
data published
from USPTO
By sending DVD-ROMs (NTCIR-6 PATENT) , or transferring the data files electronically.

Documents, Topics and Questions

Unexamined Japanese patent applications 1993-2002

This document set consists of unexamined Japanese patent applications published in 1993-2002 from the Japanese Patent Office.


Patent Abstracts of Japan 1993-2002

The Patent Abstracts of Japan (PAJ) are translations of the JAPIO Patent Abstracts, which are edited manually on the basis of summaries in source applications.


USPTO patent grant data 1993-2002

This document set consists of patent grant data published in 1993-2002 from the U.S.Patent & Trademark Office (USPTO).


(1) Japanese Retrieval Subtask

Search Topics

Each search topic is a claim in a Japanese patent application and the total number of search topics is 2908. Of the 2908 search topics, 34 and 1189 search topics are the same topics used in NTCIR-4 and NTCIR-5, respectively.

Relevance judgment

The relevant patents for a search topic is one or more citations used to reject the topic application. The relevance levels are as follows: (H) citations which do not share IPC subclasses with the topic patent, (A) citations which partially share IPC subclasses with the topic patent, (B) citations whose IPC subclasses are identical with those for the topic patent, (C) noncitation patents. Relevant patents for a topic must be published on a day before the topic patent was filed.


(2) English Retrieval Subtask

Search topics

Each search topic is a claim in a USPTO patent grant and the total number of search topics is 3221.

Relevance judgment

The relevant patents for a search topic is one or more patents cited in the topic application. The relevance levels are as follows: (A) citations whose IPC subclass is different from that for the topic patent, (B) citations whose IPC subclasses is identical with those for the topic patent, (C) patents that are not cited in the topic patent.


(3) Classification Subtask

Search Topics

Search topics are patent applications extracted from unexamined Japanese patent applications published in 1998-1999 or PAJs that corresponds to these applications. The number of topics is 21606. The applications published in 1993-1997 can be used to train systems.

Relevance judgment

The correct categories for each topic are those provided by the Japanese Patent Office.


The followings are the procedures to obtain the test collection. The test collection and data available from NII are free of charge.


Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Notice

The test collection has been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collection included in the test collection were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them .