README of Conversion Scripts for Mainichi Newspaper 2002 to 2005 ------------------------------------------------------------------------------- Data conversion script "mai2ntc-r-utf.pl": "mai2ntc-r-utf.pl" is a script for format conversion of document data from "Mainichi Newspaper" format to NTCIR format. HISTORY: "mai2ntc-r-utf.pl" is making of previous version "mai2ntc-r.pl" correspond to UTF-8 code. In "mai2ntc-r-utf.pl", the Lingua::JA::Regular::Unicode module is newly used. PREPARATION: To use mai2ntc-utf, you should install Lingua::JA::Regular::Unicode module. Installing the Lingua::JA::Regular::Unicode module is as follow: (After you become root user) # perl -MCPAN -e shell cpan> install install Lingua::JA::Regular::Unicode (If the setting of CPAN has not finished yet, please refer to CPAN web site.) USAGE: % perl -X mai2ntc-r-utf.pl is a file of "Mainichi Newspaper" documents or a directory which includes the documents. is a file for removal of no-content documents. -X option is disable all warnings. Example: % perl -X mai2ntc-r-utf.pl mai2002a.txt ntc8-mai2002a.txt ntc8-mai2002a.err ------------------------------------------------------------------------------- mai2ntc-r-utf.pl: Yohei Seki (Toyohashi University of Technology) README: Daisuke Ishikawa (NTCIR Project Researcher)