[ntcir:139] NII Open Forum: Chinese termporal extraction & identification of newly coined words


Here is a announcment for the NII Open Forum on Informatics
on the topics of Term or Information Extraction.

The first talk is on "Chinese Temporal Information Extraction",
which is delivered by Prof Kam Fai Wong, the Chinese University
of Hong Kong. The second talk is on "Identification of
Newly-coined Words", by Dr Keita Tsuji, NII.

Both look at the terms in the text and use the temporal
information in different ways.

You are most welcome.  Please come and join the discussion!

The first NII Open Forum on Informatics (in FY2004 Series)

Date: 9th April 2004     10:30-12:30
Place: National Center of Sciences 20F (Rm. 2004+2005)
    Prof Kam-Fai WONG (The Chinese University of Hong Kong)
    Dr Keita Tsuji (National Institute of Informatics)



Towards Chinese Temporal Information Extraction and Its Applications

Professor Kam-Fai WONG
Department of Systems Engineering & Engineering Management
The Chinese University of Hong Kong

     Temporal information carries information about changes and time of
the changes. It is regarded as an equally, if not more, important piece
of information in applications like extracting and tracking information
over time or planning and evaluating activities. The conventional
information systems may maintain and manipulate the occurrence time of
events, but they may not be able to handle users’ queries concerning how
an event relates to another in time. Undoubtedly, they are not capable
of inferecing new information which are not presented in the texts, but
can be derived from existing facts.  The systems, which cannot cater for
temporal information effectively, are thus rather restricted. Therefore,
it is useful to capture and maintain the temporal knowledge associated
to each action, and introduce an effective inference mechanism into an
information system.

      In a broader sense, when we speak of the temporal information
contained in languages, we do not simply mean those explicit
specifications, such as time clauses or phrases associated with the
actions addressed. It also includes the information which is implicitly
embedded in verbs. Thus, modeling of the temporal aspects of a language
is more complicated than a physical time-dependent system. Over the past
years, temporal information processing and reasoning have received
increasing attentions. Nevertheless, only a few researchers have
investigated these areas in Chinese. In this seminar, I will briefly
introduce our past and ongoing research activities on temporal
information extraction, processing and inference.

K.F. Wong obtained his PhD from Edinburgh University, Scotland, in 1987.
 After his PhD, he has performed research in Heriot-Watt University
(Scotland), UniSys (Scotland) and ECRC (Germany).  At present he is a
professor in the Department of Systems Engineering and Engineering
Management, the Chinese University of Hong Kong (CUHK) and in parallel
serves as the director of the Centre for Innovation and Technology
(CINTEC), CUHK.  His research interest centers on Internet programming
and applications,  Chinese computing and parallel database and
information retrieval.  He has published over 100 technical papers in
these areas in various international journals and conferences and books.
 He is a member of the ACM, CLCS, IEEE-CS AND IEE (UK).  He is the
founding Editor-In-Chief of ACM Transactions on Asian Language
Processing (TALIP) and a member of the editorial board of the Journal on
Distributed and Parallel Databases, International Journal on Computer
Processing of Oriental Languages and International Journal on
Computational Linguistics and Chinese Language Processing.  He is the
general co-chair of AIRS04, panel co-chair of VLDB2003, PC co-chair of
IRAL03, ICCPOL01 and ICCPOL99 and General Chair of IRAL00 and and
AIRS2004; also PC members of many international conferences, e.g. some
recent ones are: SIGGMOD04, DASFAA04, etc. He is an active member of ACM
and serves as the China program coordinator of the Membership Activities
Board (MAB).



Towards Identification of Newly-coined Words Which Are To Be Important
       in Specialized Domain

Assistant Professor
Human and Social Information Science Division
National Institute of Informatics

If we could predict which words are to be important, among newly-coined
ones, in specialized domain, it will be useful for compilation and
revision of terminology and for detection of emerging trends in
specialized domain. Toward this goal, I investigated the features of
newly-coined words which would enable us the prediction. I mean two
types of "important" words i.e. (1) words which are frequently used and
(2) words which are connected to the special concepts in the domain. The
basic points which I would like to emphasize are (a) importance of
introducing the temporal information to the term extraction and (b)
importance of distinguishing newly-coined terms from the other at the
term extraction. As for (a), many of the researches about term
extraction has used texts without considering when they were produced.
The confirmation about how old texts we can use as a resource for
"present term" extraction has not been done. And the information about
the temporal validity of the extracted terms is useful for lexicographer
who would like to delete or avoid adding the terms which will die
immediately. As for (b), many of the researches about term extraction
has not distinguished newly-coined terms from the others. The frequency
of the newly-coined terms in the texts are inherently low and it is
difficult to extract them by frequency-based method proposed so far.
Against these background, I used the papers of journal (Journal of the
American Society for Information Science and Technology) and proceedings
(SIGIR) which range 17 years and investigated how existing term
extraction methods are useful for identifying important words defined as
(1) and (2). The methods I adopted are ones which are based on TFIDF,
term representativeness of Hisamitsu, and measure of Nakagawa.

Keita TSUJI obtained his PhD from University of Tokyo, Japan, in 2003.
Since 2001, he has performed research in National Institute of
Informatics (Japan) as a research associate. His research interest
centers on computational linguistics, natural language processing and
library science.
This forum is open to the public.

Onsite registration is available, but pre-registration is more helpful
for us to prepare the handouts or so. If you paln to participate, please
send your name, postal address, telephone number, E-mail address to:

Admission : Free