[JAPANESE] [NTCIR Home] [NTCIR DATA Home]
The NTCIR-7 MOAT test collection can be used for experiments of multi-lingual opinion analysis in Japanese, English, and Chinese (simplified/traditional) (CstJE) such as
The document sets provided for MOAT are relevant documents for about 20 search topics in CsCtJE. The documents are news articles in CsCtJE languages, which were published in Asian areas from 1998 to 2001. The test collection also includes about 20 search topics in CstJE, opinion information judged by three assessors, and evaluation script.
Collection | Task | Document Data | Task Data | ||||||||||||||
Genre | File Name | Lang. |
Years | # of Docs | Size | Topics | Documents | Sentences | Opinion Expressions | Opinion Information | |||||||
Lang. |
# | Opinionated | Polarity | Holder | Target | Relevant | |||||||||||
NTCIR-7 MOAT | Opinion Analysis | Newspaper Articles | Mainichi Newspaper | JA | 1998-2001 | 419,759 | 544 MB | JA | 22 | 287 | 7,163 | 7,569 | All sentences, Y/N | For Opinion Expressions, one of POS/NEG/NEU | For Opinion Expressions, the opinion holder as a string | For Opinion Expressions, the opinion target as a string | For opinionated sentences, Y/N |
CIRB020 | Ct | 1998-1999 | 249,508 | 320 MB | Ct | 17 | 246 | 6,174 | 6,176 | ||||||||
CIRB040 | Ct | 2000-2001 | 901,446 | 581.7MB | |||||||||||||
Xinhua Chinese (from LDC) | Cs | 1998-2001 | 295,875 | 511 MB | Cs | 16 | 271 | 5,301 | 7,523 | ||||||||
Lianhe Zaobao | Cs | 1998-2001 | 249,287 | 230MB | |||||||||||||
Mainichi Daily | EN | 1998-2001 | 24,878 | 22.8MB | EN | 17 | 167 | 4,711 | 4,733 | ||||||||
Korea Times | EN | 1998-2001 | 50,129 | 45.7MB | |||||||||||||
Hong Kong Standard | EN | 1998-1999 | 96,683 | 252MB | |||||||||||||
Xinhua English (from LDC) | EN | 1998-2001 | 406,791 | 229MB | |||||||||||||
Straits Times(A) | EN | 1998-2001 | - | 250MB |
J: Japanese, E: English, C: Chinese@(Ct: traditional Chinese, Cs: simplified Chinese)
the document collections available from NII for research purpose | |
NII can offer only the relevant documents for search topics. Full document collection published in 1998-2001 is available for research purpose use other than NTCIR participation from other party. | |
the document collections available for task participants for free, and available for research purpose use other than NTCIR participation from other party with fee |
The NTCIR-7 MOAT Test Collection can be used only with the annotated relevant
documents which is selected from Mainichi Newspaper Articles and included
in Task Data available from NII. In case you use MOAT test collection without full document collection, this means that you should restrict to use relevant documents only selected by organizers. That is, you definitely ignore the preprocess module for IR in practical task setting to extract opinions relevant to topics from huge amount of news data. Therefore, if you would like to conduct experiments in more practical opinion retrieval task from huge document collection, you also should request full document collection. People who are not participating in the NTCIR Workshop must apply for and purchase the Research Use Mainichi News Data set from Nichigai Associates or Mainichi Newspaper. People who live overseas and can understand the Japanese language paperword as well as transfer Japanese Yen are also able to purchase from Nichigai Associates. In order to use the purchased data with the NTCIR Test Collection please download the script below and run it to convert the data into the NTCIR Format.
|
||
The Xinhua data is available from the LDC under a research license. Instructions
on how to download the LDC's agreement will be provided upon approval of the
NTCIR application form by NII. For more information for this application, please visit the URL; http://research.nii.ac.jp/ntcir/permission/ntcir-7/ntcir7xinhua-research.html |
@@
The task data consists of the topics (about 20 topics aimed at newspaper
data from 1998-2001 in English, Simplified Chinese, Traditional Chinese,
and Japanese), pre-segmented files for the relevant documents that have
been annotated, and the opinoin annotation data. This data is distributed
by NII as the Topic Data. The topics are slightly different from the ones
that were used in the search task, so please be careful when using them.
Please see the README files for details.
Application Process
--- The Test Collection Application Process follows. Documents
distributed by NII are free of charge.
- First, email the "Test Collection Application Form" for the document sets that you require to E-mail to ntc-secretariat.
- The User Agreement (memorandum on Permission to Use Test Collection) is required.
- the User Agreement form must be filled out and sent by postal mail or courier to the Address below.
- Please download and make two copies of the form in double-sided print.
- Signatures are needed on both agreement forms.
- After counter-signed by NII side, one copy of the form will be sent to you and one copy will be kept by the NII.
Required Forms@---
Reference
Overview of Multilingual Opinion Analysis Task at NTCIR-7
Address to which to send the forms---
NTCIR Project (Rm.1309)
National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku, Tokyo
101-8430, JAPAN
PHONE: +81-3-4212-2750
FAX: +81-3-4212-2751
Important Points --- The document data that is contained in the Test Collection is being offered by NII either for free or under a compensatory licensing agreement. In either case, they retain their copyright claims and the data itself has value as a commercial good, but as they recognize the importance of having large data sets avaialble for information retrieval research we have obtained their consent to use the materials. It is important for us as researchers, in order to be able to continue to use this kind of data, to retain the trust and confidence of the data creators, organizers, and providers. For that reason, please be sure that you have completed read, understood, and agree with these consent forms and memorandums. It is imperative that you not infringe on the rights of the data providers in any way, and only use this data for research (non-commercial) purposes.
[JAPANESE] [NTCIR Home] [Top of this page] [NTCIR DATA Home]