[JAPANESE] [NTCIR Home] [NTCIR DATA
Home]
NTCIR-5 PATENT (IR Test Collection)

This test collection is intended to evaluate three different techniques
(subtasks) related to patent information processing: document retrieval,
passage retrieval, and classification. In the document retrieval, a claim
in a patent application is used as a search topic to search for the patents
that can invalidate the demand in the topic patent. In the passage retrieval,
the
paragraphs (passages) in a document retrieved for the document retrieval are sorted according to the degree to which a passage provides grounds to judge whether the document is relevant.
In the classification, patent applications are categorized according to
the F-term classification system. The document collection includes unexamined
Japanese patent applications published in 1993-2002 and Patent Abstracts
of Japan published in 1993-2002. The entire collection is provided by NII
for research purposes.
| Collection |
Task |
Documents |
Task data |
| Genre |
Filename |
Lang. |
Year |
# of docs |
Size |
Topic/ |
Relevance
judge |
| Lang. |
# |
| NTCIR-5 PATENT |
IR |
patent full-text |
Publication of unexamined Japanese patent applications
(kkh) |
J |
1993-2002 |
3,496,252 |
94.5GB |
JE |
Document Retrieval 1,223
Passage Retrieval 356
Classification - Theme 2,008
Classification - F term 500
|
4
3
1
|
| patent abstract |
Patent Abstracts of Japan
(paj) |
E |
1993-2002 |
3,496,252 |
5,482MB |
*The entire collection is provided by NII.

- Unexamined Japanese patent applications 1993-2002
- The document set consists of unexamined Japanese patent applications published
in 1993-2002, which is the same data set provided by the Japanese Patent
Office, but does not include diagrams.
- Patent Abstracts of Japan 1993-2002
- The Patent Abstracts of Japan are translations of the JAPIO Patent Abstracts, which are edited manually on the basis of summaries in source applications.

(1) Document Retrieval Subtask
- Search Topics
- Each search topic is a claim in a Japanese patent application and the total
number of search topics is 1223. Of the 1223 search topics, 34 search topics
are the same topics used in NTCIR-4. All search topics were manually translated
into English.
- Relevance judgement
- For the 34 search topics used in NTCIR, the relevance judgement is also
the same as in NTCIR-4. professional searchers performed relevance judgement
based on the following four grades: (A) patent that can invalidate a search
topic claim, (B) patent that can invalidate a search topic claim when used
with other patents, (C) irrelevant patent that can be judged by reading
the full text, (D) irrelevant patent that can be judged by looking at the
title. For the remaining 1189 search topics, the citations provided by
examiners in the Japanese Patent Office are used as relevant documents,
each of which is assigned to one of the following two grades: (A) the citation
used to reject the search topic patent and (B) a citation used to reject
the search topic patent with another citation.
(2) Passage Retrieval Subtask
- Search Topics
- Search topics are the relevant documents for 41 search topics that were used in NTCIR-4
Patent Retrieval Task. The total number of search topics is 356.
- Relevance judgement
- Relevant passages were determined based on the following criteria: (A)
if a single passage can be grounds to judge the target document as relevant
or partially relevant, this passage was judged as relevant, (B) if a group
of passages can be grounds to judge the target document as relevant or
partially relevant, this passage group was judged as relevant.
(3) Classification Subtask
- Search Topics
- Search topics are patent applications extracted from unexamined Japanese
patent applications published in 1998-1999. The numbers of topics for theme
classification and F-term classification are 2008 and 500, respectively.
The applications published in 1993-1997 can be used to train systems.
- Relevance judgement
- The correct categories for each topic are those provided by the Japanese Patent Office.

The followings are the procedures to obtain the test collection. The test collection and data available from NII are free of charge.
- The application form of the test collection must be filled out and sent by E-mail to ntc-secretariat
.
- The user agreement (memorandumon Permission to Use Test Collection) is required.
- The user agreement form must be filled out and sent by postal mail or courier to the address below.
- Please download and make two copies of the form in double-sided print.
- Signatures are needed on both agreement forms.
- After counter-signed by NII side, one copy of the form will be sent to
you and one copy will be kept by the NII.
Address
NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Email: ntc-secretariat
Mailing List
The release of the new test collections and correction information shall
be announced through the ntcir
Mailing list
Notice
The test collection has been constructed and used for the NTCIR. They are
usable only for the research purpose use.
The documents collection included in the test collection were provided
to NII for used in NTCIR free of charge or for a fee. The providers of
the document data kindly understand the importance of the test collection
in the research on information access technologies and then granted the
use of the data for research purpose. Please remember that the document
data in the NTCIR test collection is copyrighted and has commercial value
as data. It is important for our continued reliable and good relationship
with the data producers/providers that we researchers must behave as a
reliable partners and use the data only for research purpose under the
user agreement and use them carefully not to violate any rights for them
.