# This page is available English only.
1. INTRODUCTION
NTCIR-8 MOAT Task Participant Test Collection consists of
A. Document Data and
B. Task Data.
A. Document Data
a.1 Chinese (simplified) Dataset
"Xinhua Chinese Text (2002-2005)"(Simplified Chinese Dataset for Formal Run), or "LDC2009E75 NTCIR-8 Xinhua Chinese Data 2002-2005", is available for the participants of the NTCIR-8 MOAT Task only for the
purposes of the NTCIR Workshop.
Xinhua Chinese Text (2002-2005) is also included in the following LDC corpus:
LDC2007T38:
Chinese Gigaword Third Edition, which
released on Aug 17, 2007.
If you have the above one, you do not need to newly obtain the corpus.
a.2 English Dataset
"New York Times Text (2002-2005)"(English Dataset for Formal Run) , or "LDC2009E74 NTCIR-8 New York Times Data 2002-2005", is available for the participants of the NTCIR-8 MOAT Task only for the
purposes of the NTCIR Workshop.
New York Times Text (2002-2005) is also included in the following LDC corpus:
LDC2007T07: English Gigaword Third Edition, which released on May 17, 2007.
If you have the above one, you do not need to newly obtain the corpus.
The document data will be provided to you by sending a DVD-ROM from the LDC. If
you will obtain these documents from the LDC, you will be asked to pay $US50 to
cover some portion of costs in preparing and shipping the data.
B. Task Data
For the system training data, the following annotated Xinhua Text are
available for the participants of the NTCIR-8 MOAT Task from the LDC.
- "LDC2009E76 Xinhua English Tagged Data 1998-2001"
"LDC2009E77 Xinhua Chinese Tagged Data 1998-2001"
"LDC2006E108 NTCIR Opinion Annotation Pilot Task (Xinhua English Annotated
Data 1998-2001)"
This is only a portion of the data for the NTCIR-8 MOAT Task. The rest
of the data can be obtained directly from NTCIR after filling out and sending
the two forms below.
2. HOW TO OBTAIN THE DATA
-
-
A. Sign the License Agreement:
(1) Register to participate in the MOAT task at NTCIR-8
The LDC will grant the license to the registered participants.
(2) Download the LDC's "NTCIR-8 MOAT Evaluation Agreement"
(3) Complete and sign the agreement.
(4) Fax or scan and email a signed agreement to the Linguistic Data Consortium
(LDC).
-
- Fax:+1(215)573-2175
- Email: ldc
- ATTN: Ms Ilya Ahtaridis, Membership Coordinator
B. Making payment
Payment of Corpora Fees can be made in one of three ways:
1. with a check from a bank with branches in the United States
For credit to The Trustees of the University of Pennsylvania.
2. with a wire to:
- Wachovia Bank NA
123 South Broad Street
Philadelphia, PA 19109
ABA NO. 031201467
Account No. 2000018692644
SWIFT CODE: PNBPUS33PHL
- For credit to The Trustees of the University of Pennsylvania
- Attn:Ms. Ilya Ahtaridis +1(215) 573 1275
3. with Visa or MasterCard
Please provide the following:
- 1. Type of credit card
- 2. Credit card number
- 3. Expiration date
- 4. Credit card billing address
Please mail checks to the LDC, but note that they should credit 'the Trustees
of the University of Pennsylvania'.
For security purposes, please do not provide credit card details by email.
It is recommended that you call the LDC at +1(215) 573-1275 or use our
VISA/MasterCard Information Form.
- C. The document data will be provided to you by the LDC.
- Contacting LDC:
- Linguistic Data Consortium
- 3600 Market Street
- Suite 810
- Philadelphia, PA, 19104-2653, USA
- General Office Telephone:+1(215)898-0464
- Membership Office Telephone:+1(215) 573-1275
- Fax:+1(215)573-2175
- Email: ldc
- ATTN: Ms Ilya Ahtaridis, Membership Coordinator
3. SCOPE OF THE LICENSE
After Userソスfs participation
in the NTCIR-8 Multilingual Opinion Analysis Task has ended, User agrees to delete the Data from any
computer or media onto which it has been copied and to return all discs to the
LDC, except that User may use LDC2006E108 NTCIR Opinion
Annotation Pilot Task (Xinhua English Annotated Data 1998-2001) after the NTCIR-8
Multilingual Opinion Analysis Task has ended for Opinion Analysis research.
User may keep the data by agreeing to pay the LDC the non-member fee and signing the generic LDC nonmember user agreement.
For the detailed conditions, please consult the agreement.
4. CONVERSION OF LDC DOCUMENT DATA INTO NTCIR FORMAT
The documents in the obtained Corpora shall be converted into the NTCIR
standard document format by the scripts nyt2ntc.pl.(for New York Times text) and xin2ntc-new.pl(for Xinhua Chinese Text).