NTCIR Project
NTCIR-5 PATENT
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]


NTCIR-5 PATENT (IR Test Collection)

Test Collection
This test collection is intended to evaluate three different techniques (subtasks) related to patent information processing: document retrieval, passage retrieval, and classification. In the document retrieval, a claim in a patent application is used as a search topic to search for the patents that can invalidate the demand in the topic patent. In the passage retrieval, the
paragraphs (passages) in a document retrieved for the document retrieval are sorted according to the degree to which a passage provides grounds to judge whether the document is relevant.
In the classification, patent applications are categorized according to the F-term classification system. The document collection includes unexamined Japanese patent applications published in 1993-2002 and Patent Abstracts of Japan published in 1993-2002. The entire collection is provided by NII for research purposes.

Collection Task Documents Task data
Genre Filename Lang. Year # of docs Size Topic/ Relevance
judge
Lang. #
NTCIR-5 PATENT IR patent full-text Publication of unexamined Japanese patent applications
(kkh)
J 1993-2002 3,496,252 94.5GB JE
Document Retrieval 1,223

Passage Retrieval 356

Classification - Theme 2,008
Classification - F term 500

4

3

1

patent abstract Patent Abstracts of Japan
(paj)
E 1993-2002 3,496,252 5,482MB

The entire collection is provided by NII.


Documents, Topics and Questions

Unexamined Japanese patent applications 1993-2002
The document set consists of unexamined Japanese patent applications published in 1993-2002, which is the same data set provided by the Japanese Patent Office, but does not include diagrams.
Patent Abstracts of Japan 1993-2002
The Patent Abstracts of Japan are translations of the JAPIO Patent Abstracts, which are edited manually on the basis of summaries in source applications.



(1) Document Retrieval Subtask

Search Topics
Each search topic is a claim in a Japanese patent application and the total number of search topics is 1223. Of the 1223 search topics, 34 search topics are the same topics used in NTCIR-4. All search topics were manually translated into English.
Relevance judgement
For the 34 search topics used in NTCIR, the relevance judgement is also the same as in NTCIR-4. professional searchers performed relevance judgement based on the following four grades: (A) patent that can invalidate a search topic claim, (B) patent that can invalidate a search topic claim when used with other patents, (C) irrelevant patent that can be judged by reading the full text, (D) irrelevant patent that can be judged by looking at the title. For the remaining 1189 search topics, the citations provided by examiners in the Japanese Patent Office are used as relevant documents, each of which is assigned to one of the following two grades: (A) the citation used to reject the search topic patent and (B) a citation used to reject the search topic patent with another citation.


(2) Passage Retrieval Subtask

Search Topics
Search topics are the relevant documents for 41 search topics that were used in NTCIR-4 Patent Retrieval Task. The total number of search topics is 356.
Relevance judgement
Relevant passages were determined based on the following criteria: (A) if a single passage can be grounds to judge the target document as relevant or partially relevant, this passage was judged as relevant, (B) if a group of passages can be grounds to judge the target document as relevant or partially relevant, this passage group was judged as relevant.


(3) Classification Subtask

Search Topics
Search topics are patent applications extracted from unexamined Japanese patent applications published in 1998-1999. The numbers of topics for theme classification and F-term classification are 2008 and 500, respectively. The applications published in 1993-1997 can be used to train systems.
Relevance judgement
The correct categories for each topic are those provided by the Japanese Patent Office.


The followings are the procedures to obtain the test collection. The test collection and data available from NII are free of charge.


Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Mailing List

The release of the new test collections and correction information shall be announced through the ntcir Mailing list

Notice

The test collection has been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collection included in the test collection were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them .