NTCIR Project
NTCIR-7 MOAT
Research Purpose Use of Test Collection

[JAPANESE] [NTCIR Home] [NTCIR DATA Home]


NTCIR-MOAT (Multilingual Opinion Analysis Test Collection)

Test Collection

The NTCIR-7 MOAT test collection can be used for experiments of multi-lingual opinion analysis in Japanese, English, and Chinese (simplified/traditional) (CstJE) such as

The document sets provided for MOAT are relevant documents for about 20 search topics in CsCtJE. The documents are news articles in CsCtJE languages, which were published in Asian areas from 1998 to 2001. The test collection also includes about 20 search topics in CstJE, opinion information judged by three assessors, and evaluation script.

Collection Task Document Data Task Data
Genre File Name Lang.
Years # of Docs Size Topics Documents Sentences Opinion Expressions Opinion Information
Lang.
# Opinionated Polarity Holder Target Relevant
NTCIR-7 MOAT Opinion Analysis Newspaper Articles Mainichi Newspaper  JA  1998-2001 419,759 544 MB JA  22 287 7,163 7,569 All sentences, Y/N For Opinion Expressions, one of POS/NEG/NEU For Opinion Expressions, the opinion holder as a string For Opinion Expressions, the opinion target as a string For opinionated sentences, Y/N
CIRB020  Ct  1998-1999 249,508 320 MB Ct  17 246 6,174 6,176
CIRB040  Ct  2000-2001 901,446 581.7MB
Xinhua Chinese (from LDC)  Cs  1998-2001 295,875 511 MB Cs  16 271 5,301 7,523
Lianhe Zaobao  Cs  1998-2001 249,287 230MB
Mainichi Daily  EN  1998-2001 24,878 22.8MB EN  17 167 4,711 4,733
Korea Times  EN  1998-2001 50,129 45.7MB
Hong Kong Standard  EN  1998-1999 96,683 252MB
Xinhua English (from LDC)  EN  1998-2001 406,791 229MB
Straits Times(A)  EN 1998-2001 - 250MB

J: Japanese, E: English, C: Chinese@(Ct: traditional Chinese, Cs: simplified Chinese)

the document collections available from NII for research purpose
NII can offer only the relevant documents for search topics. Full document collection published in 1998-2001 is available for research purpose use other than NTCIR participation from other party.
the document collections available for task participants for free,
and available for research purpose use other than NTCIR participation from other party with fee

The NTCIR-7 MOAT Test Collection can be used only with the annotated relevant documents which is selected from Mainichi Newspaper Articles and included in Task Data available from NII.
In case you use MOAT test collection without full document collection, this means that you should restrict to use relevant documents only selected by organizers. That is, you definitely ignore the preprocess module for IR in practical task setting to extract opinions relevant to topics from huge amount of news data. Therefore, if you would like to conduct experiments in more practical opinion retrieval task from huge document collection, you also should request full document collection.

People who are not participating in the NTCIR Workshop must apply for and purchase the Research Use Mainichi News Data set from Nichigai Associates or Mainichi Newspaper. People who live overseas and can understand the Japanese language paperword as well as transfer Japanese Yen are also able to purchase from Nichigai Associates.
In order to use the purchased data with the NTCIR Test Collection please download the script below and run it to convert the data into the NTCIR Format.
The Xinhua data is available from the LDC under a research license. Instructions on how to download the LDC's agreement will be provided upon approval of the NTCIR application form by NII.
For more information for this application, please visit the URL;

http://research.nii.ac.jp/ntcir/permission/ntcir-7/ntcir7xinhua-research.html

Documents, Topics and Questions

@@

Mainichi Newspaper
Japanese news articles published in Japan in the years of 1998-2001. It contains the document records extracted from Mainichi Newspaper Full-Text Article Database CD-ROMs. It is available from the NII for the NTCIR Workshop participants free of charge for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks. For the non-participants, Mainichi Newspaper Full-Text Article Database CD-ROMs are available for research purpose use from Mainichi Newspaper Co. and the document records in the CD-ROMs shall be converted into the NTCIR standard record format by the script mai2.pl.
CIRB020
Traditional Chinese news articles published in Taiwan ROC in the years of 1998-1999. It contains the document records from: United Daily News, Economic Daily News, Min Sheng Daily, United Evening News, and Star News. It is also available for the non-participant for the research purpose use from NII.
CIRB040
Traditional Chinese news articles published in Taiwan ROC in the years of 2000-2001. It contains the document records from: United Daily News, Economic Daily News, Min Sheng Daily, and United Express. It is also available for the non-participant for the research purpose use from NII.
Lianhe Zaobao
Simplified Chinese news articles.
Xinhua News Service (Chinese)
Simplified Chinese news articles published in China PRC in the years of 1998-2001. It contains the document records from: Xinhua News Service file from LDC2008E48 NTCIR Multilingual Opinion Annotation Task Evaluation Corpus For research purposes. It is available from the Linguistic Data Consortium (LDC) for the NTCIR Workshop participants free of charge for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks.
Xinhua News Service (English)
English news articles published in China PRC in the years of 1998-2001. It contains the document records from: Xinhua News Service file from LDC2006E106 and LDC2006E108 NTCIR Opinion Annotation Pilot Task Evaluation Corpus For research purposes . It is available from the Linguistic Data Consortium (LDC) for the NTCIR Workshop participants free of charge for the purpose of accomplishing tasks set out in the NTCIR Workshop and for the purpose of research related to the tasks.
Mainichi Daily News
English articles published in Japan in the years of 1998-2001. It contains the document records from: Mainichi Daily News. It is also available for the non-participant for the research purpose use from NII.
Korea Times
English news articles published in Korea in the years of 1998-2001. It contains the document records from: Korean Times. It is also available for the non-participant for the research purpose use from NII.
Hong Kong Standard
English news articles published in Hong Kong, China PRC in the years of 1998-1999. It contains the document records from: Hong Kong Standard. It is also available for the non-participant for the research purpose use from NII.
Straits Times
English news articles published in Singapore in the years of 1998-2001. It contains the document records from: the Straits Times. It is also available for the non-participant for the research purpose use from NII.


The task data consists of the topics (about 20 topics aimed at newspaper data from 1998-2001 in English, Simplified Chinese, Traditional Chinese, and Japanese), pre-segmented files for the relevant documents that have been annotated, and the opinoin annotation data. This data is distributed by NII as the Topic Data. The topics are slightly different from the ones that were used in the search task, so please be careful when using them. Please see the README files for details.



Application Process --- The Test Collection Application Process follows. Documents distributed by NII are free of charge.

Required Forms@---

Reference
Overview of Multilingual Opinion Analysis Task at NTCIR-7

Address to which to send the forms---

NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
101-8430, JAPAN

PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751

Important Points --- The document data that is contained in the Test Collection is being offered by NII either for free or under a compensatory licensing agreement. In either case, they retain their copyright claims and the data itself has value as a commercial good, but as they recognize the importance of having large data sets avaialble for information retrieval research we have obtained their consent to use the materials. It is important for us as researchers, in order to be able to continue to use this kind of data, to retain the trust and confidence of the data creators, organizers, and providers. For that reason, please be sure that you have completed read, understood, and agree with these consent forms and memorandums. It is imperative that you not infringe on the rights of the data providers in any way, and only use this data for research (non-commercial) purposes.

[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR DATA Home]

Updated on : 2009-05-28
ntc-admin
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@