# This page is available English only.
1. INTRODUCTION
"Xinhua Chinese Text (2002-2005)"(For Formal Run) and "Xinhua Chinese Text (1998-2001)"(For system training purposes), or LDC2008E01 NTCIR-7 Advanced Cross-Lingual Information Access Task (Xinhua Chinese Text, 1998-2001) and LDC2009E75 NTCIR-8 Xinhua Chinese Data 2002-2005, are available for the participants of the NTCIR-8 ACLIA Task only for
the purposes of the NTCIR Workshop.
The document data will be provided to you by the LDC via internet server for download.
Xinhua Chinese Text (1998-2005) is also included in the following LDC corpus:
LDC2007T38:
Chinese Gigaword Third Edition, which
released on Aug 17, 2007.
If you have the above one, you do not need to newly obtain the corpus.
This is only a portion of the data for the NTCIR-7 MOAT Task. The rest of the
data can be obtained directly from NTCIR after filling out and sending the two
forms below.
2. HOW TO OBTAIN THE DATA
-
- (1) Register to participate in the ACLIA task at NTCIR-8
- The LDC will grant the license to the registered participants.
- (2) Download the LDC's "NTCIR-8 ACLIA Evaluation Agreement"
- (3) Complete and sign the agreement.
- (4) Fax a signed agreement to the Linguistic Data Consortium (LDC).
- Only "one signed form by fax" is necessary for the LDC.
- (5) The document data will be provided to you by the LDC.
- Contacting LDC:
- Linguistic Data Consortium
- 3600 Market Street
- Suite 810
- Philadelphia, PA, 19104-2653, USA
- Pone:+1(215)898-0464
- Fax:+1(215)573-2175
- Email: ldc
- ATTN: Ms Ilya Ahtaridis, Membership Coordinator
3. SCOPE OF THE LICENSE
This license is valid until April 30, 2011 only, at which time User agrees
to delete the Data and any files and software derived from it from any
computer or media onto which it has been copied and to return all media
to the LDC. User may keep the data by agreeing to pay the LDC the non-member fee and
signing the generic LDC nonmember user agreement.
For the detailed conditions, please consult the agreement.
4. CONVERSION OF LDC DOCUMENT DATA INTO NTCIR FORMAT
The documents in the obtained Corpus shall be converted into the NTCIR
standard document format by the script xin2ntc-new.pl.