[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
The test collection consists of document data (Mainichi Newspaper
1998-2001 (Japanese), Yomiuri Newspaper 1998-2001 (Japanese),
CIRB020 1998-1999 + CIRB040 2000-2001 (Tradiational Chinese, various
newspapers from Taiwan), Mainichi Daily 1998-2001 (English,
published in Japan), Daily Yomiuri 2000-2001 (English, published in
Japan), Korea Times 2000-2001 (Korean Newspaper), Hong Kong Standard
1998-1999 (English, published in Hong Kong), topics, and annotations. There are
32 topics ranging from 1998-2001, each in English, Chinese, and
Japanese. The annotations assign opinion tags to sentences in the
selected documents that are relevant to the topics. The documents
that are annotated are separately distributed in a
sentence-segmented format that aligns with the sentence numbering in
the CSV annotation files.
NII distributes the topics, sentence-segmented files, and opinion annotation tag files as the "Topic Data". The majority of the document data are available from NTCIR-6 CLIR page. The Mainichi Newspaper Japanese document data is available from Nichigai Associates and Mainichi Shinbum directly under a research license. The English document data for Mainichi Daily (1998-1999) is also available under a research license. The Yomiuri Newspaper and Daily Yomiuri is available from the Nihon Database Kiahatsu Corporation under a research license. The Xinhua data is available from the LDC under a research license.
The Chinese Data (CIRB020, CIRB040), Korea Times (2000-2001), and Hong Kong Standard (1998-1999) are available from NTCIR under a research license. There are some points that differ between the languages when looking at the relevance judgment files for search experiments. Please look at the README files for more details.Collection | Task | Document Data | Task Data | ||||||||||||
Genre | File Name | Language | Years | # of Docs | Size | Topics | Annotated Documents | Opinion Tags | |||||||
Language | # | Docs | Sentences | Opinionated | Holder | Topic Relevance | Polarity | ||||||||
NTCIR-6 Opinion | Opinionated Search | Newspaper Articles | |||||||||||||
CIRB020 | Ct | 1998-1999 | 249,508 | 788MB | CtJE | 32 | 843 | 11,907 | All sentences, Y/N | For opinionated sentences, the opinion holder as a string | All sentences, Y/N | For opinionated sentences, one of POS/NEG/NEU | |||
CIRB040 | 2000-2001 | 901,446 | |||||||||||||
mainichi | J | 1998-2001 | 419,759 | 776MB | 490 | 15,279 | |||||||||
yomuri | 1,034,699 | ||||||||||||||
mainichi daily | E | 1998-2001 | 24,878 | 471.5MB | 439 | 8,356 | |||||||||
daily yomiuri | 2000-2001 | 17,741 | |||||||||||||
Korea Times | 2000-2001 | 30,530 | |||||||||||||
Hong Kong Standard | 1998-1999 | 96,856 | |||||||||||||
Xinhua | 1998-2001 | 406,792 | 229MB |
Yellow: NII can offer the data itself. Grey: NII can offer the data to
participants in the NTCIR who are participating in an appropriate task,
otherwise non-participants
must obtain the data in some other manner from an external source.
J:
Japanese, E: English, C: Chinese@(Ct: traditional Chinese, Cs:
simplified Chinese)
How to obtain the Mainichi, Yomiuri, Daily Yomiuri and Xinhua data People who are not participating in the NTCIR Workshop must apply for and purchase the Research Use Yomiuri News Data set at Nihon Database Kaihatsu Co., Ltd. In order to use the purchased data with the NTCIR Test Collection please download the script below and run it to convert the data into the NTCIR Format. @@@@@@@@@@@
People who are not participating in the NTCIR Workshop must apply for and purchase the Research Use Mainichi News Data set from Nichigai Associates or Mainichi Newspaper. People who live overseas and can understand the Japanese language paperword as well as transfer Japanese Yen are also able to purchase from Nichigai Associates. In order to use the purchased data with the NTCIR Test Collection please download the script below and run it to convert the data into the NTCIR Format.
For more information for this application, please visit the URL; http://research.nii.ac.jp/ntcir/permission/ntcir-6/ntcir6xinhua-research.html |
@
The task data consists of the topics (32 topics aimed at newspaper
data from 1998-2001 in English, Chinese, and Japanese),
pre-segmented files for the relevant documents that have been
annotated, and the opinoin annotation data. This data is
distributed by NII as the Topic Data. The topics are slightly
different from the ones that were used in the search task, so please
be careful when using them. Please see the README files for
details.
Application Process
--- The Test Collection Application Process follows. Documents
distributed by NII are free of charge.
*To obtain the Xinhua Data:
- First, email the "Test Collection Application Form" for the document sets that you require to E-mailntc-secretariat.
- The User Agreement (memorandum on Permission to Use Test Collection) is required.
- the User Agreement form must be filled out and sent by postal mail or courier to the Address below.
- Please download and make two copies of the form in double-sided print.
- Signatures are needed on both agreement forms.
- After counter-signed by NII side, one copy of the form will be sent to you and one copy will be kept by the NII.
http://research.nii.ac.jp/ntcir/permission/ntcir-6/ntcir6xinhua-research.html
Required Forms@---
Reference
Task Overview of NTCIR-6OPINION
An Overview of NTCIR-6OPINION
Address to which to send the forms---
NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Important Points --- The document data that is contained in the Test Collection is being offered by NII either for free or under a compensatory licensing agreement. In either case, they retain their copyright claims and the data itself has value as a commercial good, but as they recognize the importance of having large data sets avaialble for information retrieval research we have obtained their consent to use the materials. It is important for us as researchers, in order to be able to continue to use this kind of data, to retain the trust and confidence of the data creators, organizers, and providers. For that reason, please be sure that you have completed read, understood, and agree with these consent forms and memorandums. It is imperative that you not infringe on the rights of the data providers in any way, and only use this data for research (non-commercial) purposes.
[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR DATA Home]Updated on : 2016-11-21