NTCIR Project
NTCIR-6 OPINION
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]


NTCIR-6 Opinion (Opinion Analysis Task Test Collection)

Test Collection

The test collection consists of document data (Mainichi Newspaper 1998-2001 (Japanese), Yomiuri Newspaper 1998-2001 (Japanese), CIRB020 1998-1999 + CIRB040 2000-2001 (Tradiational Chinese, various newspapers from Taiwan), Mainichi Daily 1998-2001 (English, published in Japan), Daily Yomiuri 2000-2001 (English, published in Japan), Korea Times 2000-2001 (Korean Newspaper), Hong Kong Standard 1998-1999 (English, published in Hong Kong), topics, and annotations. There are 32 topics ranging from 1998-2001, each in English, Chinese, and Japanese. The annotations assign opinion tags to sentences in the selected documents that are relevant to the topics. The documents that are annotated are separately distributed in a sentence-segmented format that aligns with the sentence numbering in the CSV annotation files.

NII distributes the topics, sentence-segmented files, and opinion annotation tag files as the "Topic Data". The majority of the document data are available from NTCIR-6 CLIR page. The Mainichi Newspaper Japanese document data is available from Nichigai Associates and Mainichi Shinbum directly under a research license. The English document data for Mainichi Daily (1998-1999) is also available under a research license. The Yomiuri Newspaper and Daily Yomiuri is available from the Nihon Database Kiahatsu Corporation under a research license. The Xinhua data is available from the LDC under a research license.

The Chinese Data (CIRB020, CIRB040), Korea Times (2000-2001), and Hong Kong Standard (1998-1999) are available from NTCIR under a research license. There are some points that differ between the languages when looking at the relevance judgment files for search experiments. Please look at the README files for more details.

Collection Task Document Data Task Data
Genre File Name Language Years # of Docs Size Topics Annotated Documents Opinion Tags  
Language # Docs Sentences Opinionated Holder Topic Relevance Polarity
NTCIR-6 Opinion Opinionated Search Newspaper Articles
CIRB020 Ct 1998-1999 249,508 788MB CtJE 32 843 11,907 All sentences, Y/N For opinionated sentences, the opinion holder as a string All sentences, Y/N For opinionated sentences, one of POS/NEG/NEU
CIRB040 2000-2001 901,446
mainichi J 1998-2001 419,759 776MB 490 15,279
yomuri 1,034,699
mainichi daily E 1998-2001 24,878 471.5MB 439 8,356
daily yomiuri 2000-2001 17,741
Korea Times 2000-2001 30,530
Hong Kong Standard 1998-1999 96,856
Xinhua 1998-2001 406,792 229MB

Yellow: NII can offer the data itself. Grey: NII can offer the data to participants in the NTCIR who are participating in an appropriate task, otherwise non-participants must obtain the data in some other manner from an external source.

J: Japanese, E: English, C: Chinese@(Ct: traditional Chinese, Cs: simplified Chinese)

How to obtain the Mainichi, Yomiuri, Daily Yomiuri and Xinhua data
People who are not participating in the NTCIR Workshop must apply for and purchase the Research Use Yomiuri News Data set at Nihon Database Kaihatsu Co., Ltd. In order to use the purchased data with the NTCIR Test Collection please download the script below and run it to convert the data into the NTCIR Format. @@@@@@@@@@@

People who are not participating in the NTCIR Workshop must apply for and purchase the Research Use Mainichi News Data set from Nichigai Associates or Mainichi Newspaper. People who live overseas and can understand the Japanese language paperword as well as transfer Japanese Yen are also able to purchase from Nichigai Associates. In order to use the purchased data with the NTCIR Test Collection please download the script below and run it to convert the data into the NTCIR Format.

The Xinhua data is available from the LDC under a research license. Instructions on how to download the LDC's agreement will be provided upon approval of the NTCIR application form by NII.
For more information for this application, please visit the URL;
http://research.nii.ac.jp/ntcir/permission/ntcir-6/ntcir6xinhua-research.html

Documents, Topics and Questions

@@

CIRB020
Chinese news articles published in Taiwan ROC in the years of 1998-1999. The language is Traditional Chinese. It contains the document records from: United Daily News, Economic Daily News, Min Sheng Daily, United Evening News, and Star News. It was used as part of NTCIR-3 CLIR 98 Document Collections. It is also available for the non-participant for the research purpose use from NII.
Mainichi Newspaper
Japanese news articles published in Japan in the years of 1998-2001. It contains the document records extracted from Mainichi Newspaper Full-Text Article Database CD-ROMs. It was used as part of NTCIR-3 CLIR 1998-SubCollection, NTCIR-3 QA and NTCIR-3 SUMM. It is also used for NTCIR-4 CLIR, NTCIR-4 QA and NTCIR-4 SUMM. It is available from the NII for the NTCIR Workshop participants free of charge for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks. For the non-participants, Mainichi Newspaper Full-Text Article Database CD-ROMs are available for research purpose use from Mainichi Newspaper Co. and the document records in the CD-ROMs shall be converted into the NTCIR standard record format by the script mai2.pl.
Yomiuri Newspaper
Japanese news articles published in Japan in the years of 1998-2001. It contains the document records extracted from Yomiuri Newspaper Japanese Article Data. It is new data for NTCIR. It is used for NTCIR-4 QA and NTCIR-4 SUMM. It is available from the NII for the NTCIR Workshop participants free of charge for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks. For the non-participants, Yomiuri Newspaper Japanese Article Data is available for research purpose use from Nihon Database Kaihatsu Co. Ltd. and the document records in the Data shall be converted into the NTCIR standard record format by the script yomi2ntcir.pl.
Mainichi Daily News
English articles published in Japan in the years of 1998-2001. It contains the document records from: Mainichi Daily News. It was used as part of NTCIR-3 CLIR 1998-SubCollection. It is also available for the non-participant for the research purpose use from NII.
Korea Times
English news articles published in Korea in the years of 2000-2001. It contains the document records from: Korean Times. It is new data for NTCIR. It is also available for the non-participant for the research purpose use from NII.
Hong Kong Standard
English news articles published in Hong Kong, China PRC in the years of 1998-1999. It contains the document records from: Hong Kong Standard. It is new data for NTCIR. It is also available for the non-participant for the research purpose use from NII.
Xinhua News Service
English news articles published in China PRC in the years of 1998-2001. It contains the document records from: Xinhua News Service file from LDC2006E106 NTCIR Opinion Annotation Pilot Task Evaluation Corpus For research purposes . It is new data for NTCIR. It is available from the Linguistic Data Consortium (LDC) for the NTCIR Workshop participants free of charge for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks. For the non-participants, LDC2006E106 NTCIR Opinion Annotation Pilot Task Evaluation Corpus For research purposes is available for research purpose use from the LDC and the document records in the Corpus shall be converted into the NTCIR standard record format by the script xinhua2ntcir.pl.

@

The task data consists of the topics (32 topics aimed at newspaper data from 1998-2001 in English, Chinese, and Japanese), pre-segmented files for the relevant documents that have been annotated, and the opinoin annotation data. This data is distributed by NII as the Topic Data. The topics are slightly different from the ones that were used in the search task, so please be careful when using them. Please see the README files for details.



Application Process --- The Test Collection Application Process follows. Documents distributed by NII are free of charge.

*To obtain the Xinhua Data:
http://research.nii.ac.jp/ntcir/permission/ntcir-6/ntcir6xinhua-research.html

Required Forms@---

Reference
Task Overview of NTCIR-6OPINION
An Overview of NTCIR-6OPINION

Address to which to send the forms---

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
102-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751

Important Points --- The document data that is contained in the Test Collection is being offered by NII either for free or under a compensatory licensing agreement. In either case, they retain their copyright claims and the data itself has value as a commercial good, but as they recognize the importance of having large data sets avaialble for information retrieval research we have obtained their consent to use the materials. It is important for us as researchers, in order to be able to continue to use this kind of data, to retain the trust and confidence of the data creators, organizers, and providers. For that reason, please be sure that you have completed read, understood, and agree with these consent forms and memorandums. It is imperative that you not infringe on the rights of the data providers in any way, and only use this data for research (non-commercial) purposes.

[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR DATA Home]
Updated on : 2016-11-21