The 5th NTCIR Workshop
User Agreement Forms - Xinhua News Service file

[NTCIR-5 User Ugreement Forms]
# This page is available English only.

1. INTRODUCTION

"The Xinhua News Agency English Service (1998-2001) from LDC 2003T05 English Gigawords collection" is available for the participant of the CLIR task at the 5th NTCIR Workshop only for the purpose of the NTCIR Workshop. It is free of charge.

For the detailed condition and scope of the usage, please consult the user agreement form linked from the URL below:

"NTCIR Workshop Evaluation Agreement": http://www.ldc.upenn.edu/Membership/Agreements/eval/NTCIR.html


2. HOW TO OBTAIN THE DATA

(1) Register to participate in the CLIR task at NTCIR-5
The LDC will grant the license to the registered participants.
(2) Download the LDC's " NTCIR Workshop Evaluation Agreement" from the above URL
(3) Complete and sign the agreement.
(4) Fax a signed agreement to the Linguistic Data Consortium (LDC).
Only "one signed form by fax" is enough for the LDC.
(5) The document data will be provided to you by the LDC via their intranet server for download.
Contacting LDC:
Linguistic Data Consortium
3600 Market Street
Suite 810
Philadelphia, PA, 19104-2653, USA
Pone:+1 (215) 898-0464
Fax:+1(215)573-2175
Email: ldc@ldc.upenn.edu
ATTN: Ms Llya Ahtaridis, Membership Coordinator



3. SCOPE OF THE LICENSE

The license for evaluation use as the NTCIR Workshop participants will be valid until September 30, 2006. If you would like to keep the data after the licensed will be expired, at which time user agrees to delete the corpora from any computer or media onto which it has been copied. User may keep the data by agreeing to pay the LDC the non-member fee and signing the generic LDC nonmember user agreement.

For the detailed conditions, please consult the agreement on the above URL.


4. IF YOU ALREADY HAVE THE "LDC 2003T05 English Gigawords collection" ...

If you already have the "LDC 2003T05 English Gigawords collection", please use the document in the Corpus. The files "xieYYYYMM.gz"s under the directory of "xie" are used for the CLIR task at the 5th NTCIR Workshop. YYYY is each of 1998, 1999, 2000 and 2001. MM is each month in the years. For example, a file is "xie199801.gz". In total, the 47 files from xie199801.gz to xie200111.gz will be used for NTCIR-5. (no xie200112.gz in the DVD)

5. CONVERSION OF LDC DOCUMENT DATA INTO NTCIR FORMAT

Either you will use the document data from LDC 2003TO05 DVD or the document data provided for the NTCIR evaluation license, please gunzip the document files and the apply the script xie2ntc2.pl_txt [readme] to convert the record format into the NTCIR's standard format.
 For the document files used for training and test,please consult URL: @http://homepage3.nifty.com/kz_401/Xinhua_Info.htm


[NTCIR-5 User Agreement Forms]
contact; ntc-admin
2003-03-15
Last modified: 2005-07-13