NTCIR Project
NTCIR-4 PATENT
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]


NTCIR-4 PATENT (IR Test Collection)

Test Collection
The collection consists of Document data (Japanese patent applications 1993-1997 and Patent Abstracts of Japan 1993-1997), 101 Japanese search topics (34 topics were translated into English, Simplified and Traditional Chinese, and Korean, respectively), and Relevance judgments for each search topic. Japanese patent applications published in 1993-1997 are used for the Retrieval task. For the Patent Abstracts(paj), "NTCIR-4/5 PATENT Supplement:Patent Abstract Japan," a DVD containing abstracts published in 1993-2002 will be provided. Each search topic is a claim extracted from Japanese patent applications. Relevant documents for a search topic are the patents that can invalidate the demand in the topic claim. The entire collection is provided by NII for research purposes.

Collection Task Documents Task data
Genre Filename Lang. Year # of docs Size Topic/ Relevance
judge  
Lang. #
NTCIR-4 PATENT IR patent full-text kkh J 1993-1997 3,496,252 94.5GB CtCsKJE 101
4 grades
E
patent abstract paj 1993-1997 3,496,252 5,482MB

*The entire collection is provided by NII.

Documents, Topics and Questions

  

Japanese patent applications 1993-1997
The document set consists of Unexamined Japanese patent applications published in 1993-2002, which is the same data set provided by the Japanese Patent Office, but does not include diagrams.
Patent Abstracts of Japan 1993-1997
The Patent Abstracts of Japan are translations of the JAPIO Patent Abstracts, which are edited manually on the basis of summaries in source applications.

Search Topics
Each search topic is a claim in a Japanese patent application. The total number of search topics is 101. Of the 101 topics, 34 main topics were translated into English, Simplified and Traditional Chinese, and Korean, respectively.
Relevance judgement
For all search topics, relevant patents for a topic must be published on a day before the topic patent was filed. For the 34 main topics, the citations provided by examiners of the Japanese Patent Office and the patents identified by professional searchers are used as relevant documents. Relevance judgment was performed based on the following four grades: (A) patent that can invalidate a topic claim, (B) patent that can invalidate a topic claim when used with other patents, (C) irrelevant patent that can be judged by reading the full text, (D) irrelevant patent that can be judged by looking at the title. For the citations, professional searchers judged A or B. For the remaining 67 search topics, only the citations, for which the professional searchers judged A or B, are used as relevant documents.

The followings are the procedures to obtain the test collection. The test collection and data available from NII are free of charge.

Address

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat

Mailing List

The release of the new test collections and correction information shall be announced through the ntcir Mailing list

Notice

The test collection has been constructed and used for the NTCIR. They are usable only for the research purpose use.
The documents collection included in the test collection were provided to NII for used in NTCIR free of charge or for a fee. The providers of the document data kindly understand the importance of the test collection in the research on information access technologies and then granted the use of the data for research purpose. Please remember that the document data in the NTCIR test collection is copyrighted and has commercial value as data. It is important for our continued reliable and good relationship with the data producers/providers that we researchers must behave as a reliable partners and use the data only for research purpose under the user agreement and use them carefully not to violate any rights for them .