Preface | Overview | Invited talk | CCLQA | IR4QA | GeoTime | MOAT | PAT MN | PAT MT | CQA
For the Eighth NTCIR Workshop (NTCIR-8), we have selected and organized seven research areas as gtasksh to investigate, test and benchmark on the newly constructed test collections. They are; Complex Cross-Lingual Question Answering (CCLQA), Information Retrieval for Question Answering (IR4QA), Geographic and Temporal Information Retrieval Track (GeoTime), Multilingual Opinion Analysis (MOAT), Patent Machine Translation (PATMT), Patent Mining (PATMN) and a pilot task for Community Question Answering (CQA). Among them GeoTime and CQA are new to NTCIR and other have some new elements. Each task has own challenge to tackle and both ask organizers and participating research groups worked hard towards them. This paper outlines the Eighth NTCIR Workshop, which is the latest in a series for community-oriented efforts to enhance the research through providing infrastructure for evaluation and testing, and the forum of the researchers. It briefly describes the background, tasks, participants, test collections and other resources available through NTCIR-8. The purpose of this paper is to serve as an introduction to the research described in detail in the rest of the proceedings of the eighth NTCIR Workshop.
On April 27, 2009, IBM unveiled the details of a project for building an advanced computing system that will be able to compete with humans at the game of Jeopardy! Computer systems that can directly and accurately answer peoples' questions over a broad domain of human knowledge have been envisioned by scientists and writers since the advent of computers themselves. Consider, for example, the computer on Star Trek ? how it understands questions and quickly provides accurate, customized answers and can engage in a fluent information seeking dialog with the user. We call this technology open domain question answering and it has tremendous promise for impacting society and business. Applications in business intelligence, health care, customer support, enterprise knowledge management, social computing, science and government would all benefit from such technology. The Project Watson is addressing a grand challenge in Computer Science aimed at illustrating how the integration and advancement of Natural Language Processing (NLP), Information Retrieval (IR), Machine Learning (ML), massively parallel computation and Knowledge Representation and Reasoning (KR&R) can advance open-domain automatic Question Answering to a point where it clearly and consistently rivals the best human performance. An exciting proof-point in this challenge is to develop a computer system that can successfully compete against top human players at the well-known Jeopardy! quiz show. Attaining champion-level performance at the game of Jeopardy! requires a computer system to rapidly and accurately answer challenging open-domain questions, and to predict its own performance on any given category/question. The system must deliver high degrees of precision and confidence over a very broad domain with a 3 second response time. It is highly unlikely that any system will be able to clearly justify all the answer with perfect certainty over such a broad range of natural language questions and content. Computing accurate confidences is an important requirement for determining when to gbuzz inh against your competitors and how much to bet. While critical for winning the game, high precision and accurate confidence computations are just as critical for a QA system to provide real value in business settings. The need for speed and for very high precision demands a massively parallel compute platform capable of generating and evaluating 1000fs of hypotheses and their associated evidence. In this talk we will introduce the audience to the Jeopardy! Challenge and describe our technical approach and progress on this grand-challenge problem.
After ten years of increasingly successful evaluation campaigns, the Cross-Language Evaluation Forum (CLEF) has come to an appropriate moment to assess what has been achieved in this decade and also to consider future directions and how to renew and complement it. This paper will provide a brief summary of the most significant results achieved by CLEF in the past ten years, it will describe the new format and organization for CLEF which is being experimented for the first time in CLEF 2010, and it will discuss some future perspective for CLEF, beyond 2010.
ClueWeb09 and TREC Diversity [Pdf] [Slides] [Table of Content]
The TREC Web Track explores and evaluates Web retrieval
technologies. The TREC 2009 Web Track included both a traditional adhoc retrieval task
and a new diversity task. The goal of
this diversity task is to return a ranked list of pages that together provide
complete coverage for a query, while avoiding excessive redundancy in the
result list. Both tasks will continue at TREC 2010, which will also include a new Web
spam task. The track uses the
ClueWeb09 dataset as its document collection.
This collection consists of roughly 1 billion web pages in multiple
languages, comprising approximately 25TB of uncompressed data crawled from the
general Web during January and February 2009.
For TREC 2009, topics for the track were created from the logs of a commercial search engine, with the aid of tools developed at Microsoft Research. Given a target query, these tools extracted and analyzed groups of related queries, using co-clicks and other information, to identify clusters of queries that highlight different aspects and interpretations of the target query. These clusters were employed by NIST for topic development. For use by the diversity task, each resulting topic is structured as a representative set of subtopics, each related to a different user need. Documents were judged with respect to the subtopics, as well as with respect to the topic as a whole.
In 2009, a total of 18 groups submitted runs to the diversity task. To evaluate these runs, the task used two primary effectiveness measures: a-nDCG as defined by Clarke et al. (SIGIR 2008) and an gintent awareh version of precision, based on the work of Agrawal et al. (WSDM 2009). Developing and validating metrics for diversity tasks continues to be a goal of the track. For TREC 2010, we will report a number of additional evaluation measures that have been proposed over the past year, including an intent aware version of the ERR measure described by Chapelle et al. (CIKM 2009).
Nick Craswell from Microsoft serves as the track co-coordinator. Ian Soboroff is the NIST contact. The ClueWeb09 collection was created through the efforts of Jamie Callan and Mark Hoy at the Language Technologies Institute, Carnegie Mellon University. More information may be found on the track Web page: http://plg.uwaterloo.ca/~trecweb/2010.html.
Rakuten, the Japanese largest shopping site, will distribute its data to
academia for research purpose.
The data includes the followings:
1) market item data and item homepage data
2) Hotel data and its review data
3) Golf course data and its review data
We are planning to hold Rekuten R&D Symposium in January, 2011, where one of the sessions will be dedicated to the R&D activities using the data.
The data is planned to be distributed through ALAGIN and NII-IDR in July, 2010.
This paper presents an overview of the ACLIA (Advanced Cross-Lingual Information Access) task cluster at NTCIR-8. The task overview includes: a definition of and motivation for the evaluation; a description of the complex and factoid question types evaluated; the document sources and exchange formats selected and/or defined; the official metrics used in evaluating participant runs; the tools and process used t develop the official evaluation topics; summary data regarding the runs submitted; and the results of evaluating the submitted runs with the official metrics.
In this paper, we described our CCLQA system and the evaluation results for the C-C task at NTCIR-8 ACLIA. The system consists of a Question Analysis module, IR module and Answer Extraction module. The Question Analysis module was developed for NTCIR-7 CCLQA, which is based on the Question pattern library and HowNet. The IR module was developed for NTCIR-8 IR4QA task, and the results of KECIR-CS-CS-01-T were used as question related documents. For answer extraction, a Surface-Based Multi-Strategy approach was used. It deals retrieval results with different Strategies. The evaluation results show that our system achieves 0.3450 average F-score (beta = 3).
In this paper, we describe our system implemented for the NTCIR-8 CCLQA task. The system consists of a question translation model and a general question answering system for both factoid and complex questions. The translation model combines a translation engine and an online dictionary, which can provide more accurate translations of named entities in the questions. With regard to the question answering system, a PLSA based approach is introduced for answer sentence acquisition. For answer ranking, our system expands the question set by summarizing relevant sentences from a web knowledge base and leverages both semantic and statistical information of questions. In the official evaluation results, our system achieves 18.41% F-score in English to Chinese subtask and 25.66% in monolingual Chinese subtask.
We describe Javelin, a Cross-lingual Question Answering system which participated in the NTCIR-8 ACLIA evaluation and which is designed to work on any type of question, including factoid and complex questions. The key technical contribution of this paper is a minimally supervised bootstrapping approach to generating lexicosyntactic patterns used for answer extraction. The preliminary evaluation result (measured by nugget F3 score) shows that the proposed pattern learning approach outperformed two baselines, a supervised learning approach used in NTCIR-7 ACLIA and a simple key-term based approach, for both monolingual and crosslingual tracks. The proposed approach is general and thus it has potential applicability to a wide variety of information access applications which require deeper semantic processing.
The Question Answering System can be divided into four components as Question Analysis, Document Retrieval, AnswerExtraction and Answer Generation. And because of the requirement of the Complex Cross-Lingual Question Answering(CCLQA) task the first three parts are necessary. In question Analysis we use the templates and rules to classify the questions and acquire the keywords. In Document Retrieval part Lucene was used as the search engine. And in Answer Extraction, the retrieved documents are divided into sentences. We use the keywords matching to find the candidate sentences. For different types of questions particular strategy are attempt to extract the answers. Some questions are returned good answers, while some questionsf results are not satisfied. We try to analyze the results to find the success and the mistake reasons.
This paper provides the description of our complex QA system, the NTOU XQA System participated in NTCIR-8 CCLQA Task. This QA system can answer several types of factoid questions and 5 types of complex questions defined in NTCIR-8 CCLQA Task. Different strategies are designed for finding answers to different types of questions. Named entity recognition, distance scores of question keywords, answer information patterns, and search results from the Web are techniques integrated in these strategies. The best F-measure score achieved by our system is 19.88% in monolingual task and 13.62% in cross-lingual task. But unfortunately the official evaluation is incorrect.
This paper describes the Chinese question answering system DLUT for the CS-CS Subtask evaluation of NTCIR-8 CCLQA.The system utilizes Situation Unit (SU), combined syntactic and semantic information, as a basic processed unit to get candidateanswers, wherein the SUs in the question sentences are matched with those in the texts of corpus. In this evaluation, answers are presented in the form of whole sentences instead of their simplified versions. The average F3 Score reaches 0.1954, and the average Recall is ? 0.75 for the six question types, BIOGRAPHY, DATE, DEFINITION, LOCATION, ORGNIZATION and PERSON. As the current system only employs the partial information in SUs, the evaluation result only indicates that the SU-based question answering system can get promising Recall.
This paper presents an overview of NTCIR-8 ACLIA (Advanced Cross-lingual Information Access) IR4QA (Information Retrieval for Question Answering). Following the task definitions of the first IR4QA at NTCIR-7 [13, 15], IR4QA at NTCIR-8 evaluates cross-language IR using English topics and targetting documents in Simplified Chinese, Traditional Chinese or Japanese. The corresponding monolingual IR subtasks are also within its scope. The only difference between traditional gad hoch IR tasks and IR4QA is that the latter can optionally be seen as a component of a question answering system. This paper describes the task, how the organizers collaborated with 12 participating teams (who submitted a total of 84 runs) to obtain relevance assessments for our three IR4QA test collections, the formal evaluation results, and the grun ranking forecastsh that were provided to the participants right after the submission deadline. For the relationship between IR4QA and the entire ACLIA, we refer the reader to the overview papers of ACLIA [7, 8]. For details of the individual IR4QA systems, we refer the reader to the participantsf reports.
This paper describes our work in the subtask IR4QA. Our IR system designed for this task consists of two modules: (1) query processing; (2) indexing, retrieval and re-rank. We first study the method of question classification, and the strategies of weighting based on the result of question classification. Baidu and Wanfang resources are exploited to help query expansion. Through studying the specialty of each index formats and each index unit, we create three indexes of different types: KeyFile-Unigram-Index, KeyFile-Word-Index and Indri-Word-Index. Then we use an interpolating method to re-rank the documents returned from the above three indexes. Our system achieved 0.4266 mean AP, 0.4628 mean Q and 0.6761 mean nDCG in the final evaluation, giving a strong proof of the effectiveness of our approach.
In this paper, we report various strategies for query expansion (QE) in the NTCIR-8 IR4QA subtask. We submit the results of twelve runs from the formal run, which include cross-language information retrieval from English to traditional Chinese, from English to simplified Chinese, and from English to Japanese in the official T-run, D-run and DN-run. Our approach uses Google translation and the Okapi BM25 pseudo relevance feedback as the basic retrieval system. We add more QE from Wikipedia and the result of QA analysis. In the additional runs, we use a topic web crawler to get more related web pages and to extract more keywords to act as candidates for QE..
This paper describes our work on the subtask of simplified Chinese monolingual information retrieval for question answering system at ntcir-8. We use the lemur toolkit to build index in unit of Chinese word. OKAPI BM25 as retrieval model and a density-proportional based pseudo relevance feedback method were used for query expansion. To rank all documents orders, Statistical language modeling and Minimal Mean Distance (MMD) calculating method were employed. Evaluation at NTCIR-8 shows that the best T-run from our team in terms of Mean nDCG is 0.5981, 0.3411 in Mean AP and 0.3749 in Mean Q.
This paper presents the technique details and experimental results of the information retrieval system with which we participated at the NTCIR-8 ACLIA (Advanced Cross-language Information Access) IR4QA (Information Retrieval for Question Answering) task. Document corpus in Simplified Chinese (CS) and Traditional Chinese (CT) with topics in English, CS and CT were used in our experiments. We combined the query expansion and initial retrieval results re-ranking techniques as main retrieval approach. The experimental results confirmed that query expansion based on Bose-Einstein distribution and re-ranking method based on Latent Dirichlet Allocation (LDA) are able to consistently bring significant improvements over various baseline systems. Especially the approach is capable of processing mixedmultilingual text obtained by a machine translator for crosslanguage information retrieval (CLIR). The results obtained might provide us more insight and understanding into cross-language query expansion and document re-ranking.
In this paper we describe our approaches to retrieving cross-lingual documents for question answering in the NTCIR ACLIA-IR4QA task. A few Chinese indexing techniques were used in our experiments. We mainly focused on using external recourses: web documents and Wikipedia for the key phrase identification, translation and query expansion. The evaluation shows encouraging results of our system.
For NTCIR-8Workshop UC Berkeley participated in IR4QA (Information Retrieval for Question Answering) as well as the GeoTime track. For IR4QA we only did Japanese monolingual search and English to Japanese bilingual search. Our focus was thus primarily upon Japanese topic search against the Japanese News document collection as in past NTCIR participations. We preprocessed the text using the ChaSen morphological analyzer for term segmentation. We utilized a time tested logistic regression algorithm for document ranking coupled with blind feedback. The results were satisfactory, ranking second among IR4QA overall submissions for Japanese.
We describe DCU's participation in the NTCIR-8 IR4QA task . This task is a cross-language information retrieval (CLIR) task from English to Simpli_ed Chinese which seeks to provide relevant documents for later cross language question answering (CLQA) tasks. For the IR4QA task, we submitted 5 o_cial runs including two monolingual runs and three CLIR runs. For the monolingual retrieval we tested two information retrieval models. The results show that the KL-Divergence language model method performs better than the Okapi BM25 model for the Simpli_ed Chinese retrieval task. This agrees with our previous CLIR experimental results at NTCIR-5. For the CLIR task, we compare query translation and document translation methods. In the query translation based runs, we tested a method for query expansion from external resource (QEE) before query translation. Our result for this run is slightly lower than the run without QEE. Our results show that the document translation method achieves 68.24% MAP performance compared to our best query translation run. For the document translation method, we found that the main issue is the lack of named entity translation in the documents since we do not have a suitable parallel corpus for training data for the statistical machine translation system. Our best CLIR run comes from the combination of query translation using Google translate and the KL-Divergence language model retrieval method. It achieves 79.94% MAP relative to our best monolingual run.
This paper describes the work of our WUST group in NTCIR-8 on the subtask of English to Simplified Chinese and Simplified Chinese to Simplified Chinese information retrieval for question answering (EN-CS and CS-CS IR4QA). In order to enhance the precision and efficiency in question analysis, we employ a special question analysis method extracting more appropriate key terms and apply the query expansion technique gaining more relevant key terms based on Wikipedia article content related to the query.
This paper describes our work on IR4QA system in NTCIR-8 that intends to evaluate which IR techniques are more useful to QA. We examine IR techniques which can find documents that contain answers to the questions. In our System, we exploit different external resource according to the type of question. In particular, we exploit Wikipedia, Google and Baidu Baike for identifying Named Entity translation, and also employ them to expand query for improving the precision of the retrieval. We use passage retrieval to improve average precision. Our experiments show that these techniques above can significantly increase retrieval precision.
In this paper, we describe our approach for information retrieval for question answering (IR4QA) of NTCIR-8 tasks. For improving information retrieval performance, we focus mostly on the document re-ranking technique, which locates between the first retrieval documents and query expansion. In this paper, we employ two approaches in document re-ranking. One is based on entropy clustering, a kind of unsupervised learning technology. Relevant documents from top initial retrieval result can be automatically clustered same class according to information entropy values. That is a continuation of our previous work. The other is One Class Co-Clustering (OCCC) approach. it aims to detect topical terms, and compute documentfs topicality score. The method is simple and performs well. The experiment result shows using the two approaches in Document Re-ranking, Clustering and OCCC, can improve information retrieval performance.
For the NTCIR Workshop 8 we organized a Geographic and Temporal Information Retrieval Task called gNTCIR GeoTimeh. The focus of this task is on search with Geographic and Temporal constraints. This overview describes the data collections (Japanese and English news stories), topic development, assessment results and lessons learned from the NTCIR GeoTime task, which combines GIR with time-based search to find specific events in a multilingual collection. Eight teams submitted Japanese runs (including unofficial three teams who provided runs to expand the pools) and six teams submitted English runs. One team participated in both Japanese and English.
NTCIR-GeoTime task is a task to search documents with Geographic and Temporal constraints and almost all topic can be regarded as question and answering (QA) for particular named entities. To make a good information retrieval (IR) system for QA for particular named entities, it is better to use Boolean IR model by using appropriate Boolean query with named entity information. In this paper, we propose to use ABRIR (Appropriate Boolean query Reformulation for Information Retrieval) for this problem. In this system, appropriate list of synonyms and variation of Japanese katakana description of given query are used for constructing Boolean query. Evaluation results shows that ABRIR works effectively for the task of IR for QA.
We describe an evaluation experiment on GeoTemporal Document Retrieval created for the GeoTime evaluation task of NTCIR 2010. GeoTemporal Retrieval aims at to improve retrieval results using Geographic and Temporal dimensions of relevance. To accomplish that task, systems need to extract geographic and temporal information from the documents, and then explore semantic relations among those dimensions within the documents. Since this is the first time the task is taking place our aim is to evaluate some basic techniques in order to set some research directions of our work. We aim to understand the relevance of temporal and geographic expressions for filtering purposes. The geographic expressions were extracted with Yahoo PlaceMaker and for temporal expressions we used the TIMEXTAG system. We experimented techniques using both the overall document and sentence resolutions, as also one mixed approach. We also used a query expansion mechanism in topics with no filters defined. We used the BM25 as retrieval model and preprocessed the topics with a semi-automatic methodology to create structures that let us create our filters and expansions. We learned that the sentence level is not a very good approach (but we got clues that probably the paragraph context resolution could improve the results) and the geographic and temporal expressions base filters had shown good performance.
In this paper, we reported the evaluation results of our GeoTime information retrieval system at NTCIR-8 GeoTime. We participated in the Japanese mono-lingual task (JA-JA). Our proposed method for GeoTime information retrieval is based on question decomposition and question answering. We demonstrated that the proposed method is able to accept GeoTime questions and retrieve relevant documents to some extent. However, there is still room to improve the effectiveness of retrieval. In per-topic evaluation results, we can find there are some topics that cannot be appropriately handled by our method, and therefore the method lacks in robustness in terms of variety of GeoTime questions.
We present our participation in the NTCIR GeoTime evaluation task with a semantically-avored geographic information retrieval system. Our approach relies on a thorough interpretation of the user intent by recognising and grounding entities and relationships from query terms, extracting additional information using external knowledge resources and geographic ontologies, and reformulating the query with reasoned answers. Our experiments aimed to observe the impact of semantic-based reformulated queries on the retrieval performance.
This paper reports on experiments in the NTCIR-8 GeoTime task performed by research group at School of Library and Information Science in Keio University (KOLIS), which tried to explore techniques for searching a Japanese document collection for requests on geographic and temporal information. A special component of re-ranking for enhancing performance of geographic and temporal searches was added to the KOLIS system, in which standard BM25 and probabilistic pseudo-relevance feedback (PRF) were implemented. That is, at the first stage, a list of documents relevant to a given topic was specified by standard IR techniques, and at the second stage, the list was re-ranked after scores of documents which included geographic and temporal terms were increased. More specifically, the different number of geographic and temporal terms appearing in each document was counted using a special dictionary including only such kind of terms, and its document score was modified based on the number. In this experiment of Japanese monolingual (JA-JA) retrieval and English to Japanese bilingual (EN-JA) retrieval, the search runs using jointly the re-ranking and PRF showed the highest performance, followed by the re-ranking only runs, and PRF only runs, in this order. This result indicates that the simple re-ranking technique has an effect on enhancement of geographic and temporal searches. In comparison of performance between JA-JA and ENJA searches, performance of the bilingual searches was just slightly inferior to that of monolingual searches. The experiment by KOLIS group adopts a simple query translation approach using public machine translation services of two Internet search engines, it turned out that translations obtained from the search engines worked well (the translations were segmented into a set of terms using the same indexing method applied to the Japanese document set. The method was a hybrid approach concatenating two results from longest matching with a Japanese dictionary and decomposing sentences into character-based overlapped bi-grams).
Searches for information on the web involving both geographic (gwhereh) and temporal (gwhenh) components comprise a nontrivial percentage of overall searches. In this paper, we describe an approach to identifying specific documents within a collection that satisfy a set of geo-temporal queries. To test our approach, we submitted five runs to NTCIR-8 GeoTime, using the Indri search engine, on a three-year collection of English-language newspaper articles. Our five submitted runs achieved nDCG scores ranging from 0.5758 to 0.6233 and MAP ranging from 0.3517 to 0.3951 across twenty-five separate geo-temporal queries.
For the NTCIR-8 Workshop UC Berkeley participated in the GeoTime track and the IR4QA. For the GeoTime track we did both English and Japanese with both cross-language combinations. For the Japanese and translated English texts, we preprocessed the text using the ChaSen morphological analyzer for term segmentation. For GeoTime we used a time-tested logistic regression algorithm for document ranking coupled with blind feedback for most runs. For these submitted runs we did not do any special purpose geographic or temporal processing. This brief paper describes the submitted runs and the methods used for them.
News is information about recent and important events that have happened at a geographic location at some point in time. Since Location and Time are important components of almost all news, it is important to be able to collate and search news based on time and location and not just key-words. In this paper we present an approach that can deal with not just the geography/location in the query but also consider the Geo-relationship to help pick and rank the most relevant results by introducing a Geo-hierarchy. The results for the NTCIR Geo-Time queries shows a lot of promise for this approach.
We retrieved topics that contained the geographic and temporal information at NTCIR-8 GeoTime task. Employing morphological analysis, temporal and geographic information are extracted from GeoTime collection. The index that represents a geographic hierarchy is made from the geographic information. In the experiment, we confirmed that the effect of the geographic hierarchical index when topics included term of wide area region.
In this report, we describe the experiments carried out by Dublin City University for NCTIR GeoTime 2009-10. In all we submitted five runs, which evaluated the benefit of including clustered location information when compared to a standard word-term text IR baseline. The baseline technique was the Lucene default and we developed three different algorithms to re-rank the results based on the location listm, which was clustered in two instances. In addition, we evaluated the potential benefit of employing a query expansion technique, based on WordNet. In conclusion we found that the inclusion of location information to re-rank documents offered an improvement over the baseline. Our analysis leads us believe that larger gains can be made through including location information at the indexing and initial querying stage, and then refining the final ranked list by standard IR techniques.
In this paper, we discuss the goal, task description, evaluation results, and the participants approaches for the Third Multilingual Opinion Analysis Task (MOAT) in the NTCIR- 8 workshop1. We explored our task from our past experiences towards cross-lingual opinion analysis application. In order to solve this challenging problem, we believe that two solutions are required: (1) language-transfer approaches with semi-supervised techniques and (2) cross lingual opinion question answering capabilities. To get closer to this goal, we created an opinion annotation corpora based on opinion Q&A in a common format across languages. Many teams participated in the subtasks for more than two language sides, and some teams also participated in the cross lingual subtask. There were 56 result runs submitted from 16 participants, and half of the participants submitted the results in more than two language related tasks. We hope that the MOAT in NTCIR-8 will be a milestone for cross lingual opinion analysis researches.
This paper presents our work in the Multilingual Opinion Analysis Task (MOAT) done during the NTCIR-8 evaluation campaign. We suggested a probabilistic model derived from Muller's method  that allows us to determine and weight terms (isolated words, bigram of words, noun phrases, etc.) belonging to a given category (or subset of the corpus) compared to the rest of the corpus. Based on these terms and their weights, we have adopted the logistic regression method in order to define the most probable category for each input sentence. Our participation was strongly motivated by the objective to suggest an approach on the polarity subtask of the MOAT with a minimal linguistic component with a possibility to have its performance improved by natural language specific tools. Thus, for the English language, we have adopted a combination of both machine learning approach (Z score and logistic regression) and a polarity dictionary (linguistic component). For the traditional Chinese and Japanese languages however, our current system is limited to a machine learning scheme.
This paper describes our work in the Simplified Chinese opinion analysis tasks in NTCIR-8. In the task of detecting opinioned sentences, various sentiment lexicons are used, including opinion indicators, opinion operators, degree adverbs and opinion words. The linear SVM model is selected as the main classifier, and four groups of features are extracted according to punctuations, words and sentiment lexicons. We also try a two-step classification to improve the SVM result. For extracting the opinion holder and target, we use a synthesis of CRF and heuristic rules. The evaluation results on NTCIR-8 MOAT Simplified Chinese side show that our system achieves the best fmeasure in two tasks. This demonstrates that the proposed framework is promising.
In this paper, we describe our participating system, which is based on supervised approaches and dependency parsing, for opinion analysis on traditional Chinese texts at NTCIR-8. For opinionated sentence recognition, the supervised lexicon-based approach, SVM and Maximum Entropy are combined together. For polarity classification, we use only the supervised lexicon-based approach. For opinion holder and target identification, we, on the basis of dependency parsing, identify opinion holders by means of reporting verbs and identify opinion targets by considering both opinion holders and opinion-bearing words. The results show that among all the teams participating in the traditional Chinese task, our system achieve: 1) the highest F-measure on the opinionated sentence recognition task, 2) the second highest F-measure on the identification of both opinion holders and targets, 3) the middle ranking for opinion polarity classification.
The present is marked by the availability of large volumes of heterogeneous data, whose management is extremely complex. While the treatment of factual data has been widely studied, the processing of subjective information still poses important challenges. This is especially true in tasks that combine Opinion Analysis with other challenges, such as the ones related to Question Answering. In this paper, we describe the different approaches we employed in the NTCIR 8 MOAT monolingual English (opinionatedness, relevance, answerness and polarity) and cross-lingual English-Chinese tasks, implemented in our OpAL system. The results obtained when using different settings of the system, as well as the error analysis performed after the competition, offered us some clear insights on the best combination of techniques, that balance between precision and recall. Contrary to our initial intuitions, we have also seen that the inclusion of specialized Natural Language Processing tools dealing with Temporality or Anaphora Resolution lowers the system performance, while the use of topic detection techniques using faceted search with Wikipedia and Latent Semantic Analysis leads to satisfactory system performance, both for the monolingual setting, as well as in a multilingual one.
In this paper, we briefly describe our machine-learning based method used in the NITCIR8 MOAT task, particularly, the opinioned sentence judgment subtask on both English side and Chinese side. We view this subtask as a binary classification problem and build a supervised-learning based framework. To extract meaningful sentiment features, we propose several n-gram patterns to assemble basic words and part-of-speech tags. Meanwhile, our basic classifiers are trained merely on the previous NTICR annotated corpus, in which samples are inadequate and unbalanced. Thus, we adopt a few self-learning strategies to utilize the NTCIR8 testing corpus to adjust our basic classifiers. Using the same learning framework in both language sides, we get similar performances.
Identifying an opinion target, a primary object of the opinion expression (e.g., the real-world object, event, and abstract entity), is helpful for extracting target-related opinions and detecting user interests. This paper presents a novel framework for target-based opinion analysis, which extracts opinionated sentences and identifies their opinion targets from news articles. To determine whether a sentence includes opinions, we utilize opinion lexicons (i.e., predefined clue words) and linguistic patterns. In identifying the opinion target, candidates are generated and examined for existence of four different features. We attempt to capture the relationship between an object target and opinion clues and utilize a document theme. For evaluation, we used English news articles from New York Times, provided by NTCIR-8 MOAT and annotated opinionated sentences and theirs opinion targets. Experimental results show that our proposed method is promising although many additional issues remain to be studied in the future.
The paper reports the approach of cyut system in NTCIR-8 MOAT subtask. We submitted the results of opinion judgment and polarity judgment in Traditional Chinese. Our study focused on automatically generated templates as the only features of classifier. The templates combining words with Part-of-speech or named-entity (POS/NE) tags are acquired from the training set. Experiment results show that, the template generation technology can get the same result without human edited knowledge.
We describe an opinion analysis system developed for a Multilingual Opinion Analysis Task at NTCIR8. Given a topic and relevant newspaper articles, our system determines whether a sentence in the articles has an opinion. If so, we then extract the holder of the opinion. In the opinion judgment task, we constructed a phrase-level opinion expression extractor from sentence-level annotated corpus. In opinion holder extraction task, we used the probability that the word is appeared in the opinion holder and a dependency relationship between the word and the verb of the sentence.
SICS starting points are that given a semantic word space trained on general purpose text, where distance and nearness are measures of semantic similarity, we can represent sentences by the centroid of the words that occur in it, that constructional features contribute to the organisation of this semantic space, and attitude is a semantic dimension of variation in that sentences with similar attitudinal qualities can be expected to occupy space in the vicinity of each other. This year's simplistic experiment did not yield useful results. Parameter tuning is a necessary step in any categorization excercise; this year we failed to devote the necessary effort to achieve results worth noting.
In this paper, we present our work for Simplified Chinese Monolingual opinion analysis task at NTCIR-8 by BUPT. We participated in four of all tasks except opinion target detection and answerness judgment, and submitted two runs for each task. For opinion sentence detection, we propose some features both semantic-level and grammar-level, and also summarizes some syntactic structure templates to achieve a more satisfactory classification results based on TSVM. For opinion holder detection, we firstly use CRF including six corresponding features to detect, then we propose two syntactic rules based on opinionated trigger words from syntax trees taken as additional features for the CRF to train our model. By introducing Statistical Language Models with expansion of topic words we train a relevance judgment model. To judge polarity, we compute the value of the text by our algorithm with a large-scale emotional dictionary, and set a threshold, to classify the sentiment polarity of sentence in each text.
This paper presents our work in NTCIR-8 workshop of Multilingual Opinion Analysis Task (MOAT). We describe a feature-based system that is designed to detect the opinion sentence or not. The system utilizes various features: headlines in newspapers, Japanese sentence patterns, dependency pairs, numeral features and some related to newspapers opinionated words. The experiments show that our feature-based system is feasible and effective.
This paper presents the design and implementation of an opinion mining system developed by NLPCity group for NTCIR-8 MOAT evaluation, named CTL-OM. CTL-OM incorporates two opinion mining approach, namely feature-based approach and similarity based approach. The feature-based approach incorporates computational features at punctuation-, word-, collocation-, phrase-, sentence-, paragraph- and document-level in a coarsefine multi-pass classification framework. The opinion holders and opinions targets in the opinionated sentences are then recognized. The similarity-based approach works in a different way. This approach estimates the similarity between the example sentences and testing sentence and identifies the similar example sentencetesting sentence pair. The opinion components annotated in the example sentence are utilized to recognize the corresponding components in the testing sentence. The analysis outputs by these two approaches are integrated to obtain the final opinion mining results. CTL-OM achieved promising results in Traditional Chinese and Simplified Chinese evaluation in MOAT-8, respectively. This result shows that the incorporation of featurebased and similarity-based opinion mining approach is effective.
In this paper, we briefly summarize our experience in participating in the Multilingual Opinion Analysis (MOAT) tasks in NTCIR-8 and present our preliminary experimental analysis of the effects of the opinion lexicons employed in Chinese opinion mining.
This paper presents WIA-Opinmine system developed by CUHK_Tsinghua Web Information Analysis (WIA) Virtual Research Center for NTCIR-8 MOAT Task. The system is deemed special due to three facts. Firstly, the system is able to handle Simplified Chinese and Traditional Chinese at the same time. A tool is developed to convert Traditional Chinese into Simplified Chinese before opinion analysis. Secondly, a topic model based algorithm is found effective in relevance judgment. A co-clustering algorithm is incorporated in topic modeling. Thirdly, a ranking method is adopted to rank all holder (A0's) and target (A1's) candidates recognized by a semantic role labeling tool during which topic models for each topic are fully used for judging the importance of all candidates. The NTCIR8 evaluation results as well as the post-NTCIR8 results show that our system could effectively recognize relevance sentences, opinionated sentences and polarities.
This paper introduces the Patent Mining Task at the Eighth NTCIR Workshop and the test collections produced in this task. The purpose of the Patent Mining Task is to create technical trend maps from a set of research papers and patents. We performed two subtasks: (1) the subtask of research papers classification and (2) the subtask of technical trend map creation. For the subtask of research papers classification, six participant groups submitted 101 runs. For the subtask of technical trend map creation, nine participant groups submitted 40 runs. In this paper, we also report on the evaluation results of the task.
This paper describes our system for the NTCIR-8 patent mining task which creates technical to map a research papers into IPC taxonomy. Our focus was upon the Japanese patent collection, and we applied three kinds of methods. One is based on the K-NN algorithm, we extended its similarity and ranking policy. The second is a hierarchical SVMS tree, that every node of the tree is a SVM classiPer. At last we constructed a general framework called M3 for handling huge training data set, based on the idea of divide-and-conquer. The evaluation results indicated that the extended K-NN has a better performance on both accuracy and time- consuming. And a combination strategy of re-ranking could improve the result slightly.
Accurate classification of patent documents according to the IPC system is vital for the interoperability between different patent offices and for the prior art search task involved in a patent application procedure. It is essential for companies and governments to track changes in technology in order to asses their investments and create new branches of novel solutions. In this paper, we present our experiments from the NTCIR-8 challenge to automate paper abstract classification into the IPC taxonomy and to create a technical trend map from it. We apply the k-NN algorithm in the classification process and manipulate the rank of the nearest neighbours to enhance our results. The technical trend map is created by detecting technologies and their effects passages in paper and patent abstracts. A CRF-based system enriched with handcrafted rules is used to detect technology, effect, attribute and value phrases in the abstracts. Our experiments use multi patent databases for training the system and paper abstracts as well as patent applications for testing purposes, thus characterising a cross database and cross genre task. In the subtask of Research Papers Classification, we achieve a MAP of 0.68, 0.50 and 0.30 for the English and 0.71, 0.50 and 0.30 for the J2E subclass, main group and subgroup classiers respectively. In the Technical Trend Map Creation subtask, we achieve an F-score of 0.138 when detecting technology/effect elements in patent abstracts and 0.141 in paper abstracts. Our methodology provides competitive results for the state of the art, with the majority of our official runs being ranked within the top two for both trend map (papers) and IPC coding. That said we see room for improvements especially in the detection of technologies and attributes elements in abstracts. Finally, we believe that the subtask of Technical Trend Map Creation needs to be adjusted in order to better produce a patent map. The classification system is available online at http://pingu.unige.ch:8080/IPCCat.
The authors used a word sequence labeling method for technical effects and base-technology extraction in the Technical Trend Map Creation Subtask of the NTCIR-8 Patent Mining Task. The method labels each word based on CRF (Conditional Random Field) trained with labeled data. The word features employed in the labeling are obtained by using explicit/implicit document structures, technology fields assigned to the document, effect context phrases, phrase dependency structures and a domain adaptation technique. Results of the formal run showed that the explicit document structure feature and the phrase dependency structure feature are effective in annotating patent data. The implicit document structure feature and the domain adaptation feature are also effective for annotating paper data.
This paper describes our approach to tackling the task of Technical Trend Map Creation as posed in NTCIR-8. The basic method is Conditional Random Fields, which is considered as the most advanced method in Named Entity Recognition. In order to improve the performance, we further resort a tag modification approach and pattern-based method. Our system performed competitively, achieving the top F-measure among participants in the formal run.
In this paper, we present a novel query expansion approach based on splitting the user query into a set of N-grams, and expanding them separately utilizing a set of research articles. Our approach is based on retrieving a set of relevant research articles, process their abstracts to expand the query/searched term or phrase. We aim to expand terms that a regular relevance feedback might ignore. Our work shows an improvement over several classification levels compared to several methods of expansion.
The patents cover almost all the latest, the most active innovative technical information in technical fields, therefore patent classification has great application value in the patent research domain. This paper presents a KNN text categorization method based on shared nearest neighbor, effectively combining the BM25 similarity calculation method and the Neighborhood Information of samples. The effectiveness of this method has been fully verified in the NTCIR-8 Patent Classification evaluation.
We participate in the subtask gtechnical trend map creationh of patent mining task at NTCIR-8. In this paper, we define this task as a knowledge extraction task for patent abstracts and the CRF method and Rule method are introduced in our approach. Compare with the evaluation results, we find out the effect of method of integrating CRF model and Rule model is better than that only using CRF model. However, extraction task of <value> tag is more difficult than <technology> tags.
We report the results of our experiments on the automatic assignment of patent classification to research paper abstracts in NTCIR-8. In mandatory runs, we applied an augmentation of the K-nearest neighbors methods and gLearning to Rankh to improve the classification accuracy. The results show that these methods slightly improve the classification accuracy. We also compared the accuracy by technical fields and the results show that the accuracy differs.
We developed a method of information extraction for multiple data sources or for various kinds of datasets like Internet web pages. Generally, because many different writing styles or vocabularies exist among different kinds of data, the accuracy of information extraction using various kinds of datasets is not better than that using a single kind of data. Our method divides the data by clustering and learns extraction rules to increase accuracy even if we use various kinds of datasets. In our experiment, we applied our method to a NTCIR8 Technical Trend Map Creation subtask that uses two kinds of data, patents and technical papers, and obtained the better precision than normal information extraction method.
Our group participated in the subtask of technical trend map creation for the NTCIR-8 Patent Mining Task. We prepared five types of cue phrase list using statistical methods, and used them in the analysis of research papers and patents based on the Support Vector Machines. From the experimental results, we obtained Recall of 0.110 and Precision of 0.424 for research papers, and Recall of 0.430 and Precision of 0.563 for patents.
This paper reports on an experiment to evaluate the extraction of effect expressions from patents and papers (in Japanese) at the subtask of Technical Trend Map Creation in NTCIR-8 Patent Mining Task. To obtain a more detailed structure for the expressions, we defined that effect expressions consist of TARGET, SCALE and IMPACT elements. We created training data based on these elements and assigned tags by supervised learning. Then, on the basis of conversion rules and dependency relationships, we converted these independently defined tags to the ATTRIBUTE, VALUE and EFFECT tags.
Our group took part in the Patent Mining Task of the NTCIR-8. We proposed an extraction method of EFFECT and TECHNOLOGY expressions from a patent, respectively. In order to extract TECHNOLOGY expressions, we developed a method that uses Support Vector Machine and delimiters collected by using entropy-based score. On the other hand, our method for annotation of EFFECT tags is based on delimiters using entropy-based score. We achieved accuracy of precision 0.55 and recall 0.27, F-measure 0.36, respectively.
To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation and performed the Patent Translation Task at the Eighth NTCIR Workshop. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 3 200 000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus. These sentence pairs can be used to train and evaluate machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages. In addition, our test collection includes machine translation results and their evaluation scores determined by human experts, which can be used to propose automatic evaluation methods for machine translation. This paper describes our test collection, methods for evaluating machine translation, and evaluation results for research groups participated in our task.
This paper gives the system description of the Dublin City University Machine Translation system MaTrEx for our participation in the translation subtask in the NTCIR-8 Patent Translation Task under the team ID of DCUMT. Four techniques are deployed in our systems: supertagged PB-SMT, context-informed PB-SMT, noise reduction, and system combination. For EN-JP, our system stood second in terms of BLEU reference score among six participants.
In this article, we describe system architecture, training data preparation and experimental results of the EIWA group in the NTCIR-8 Patent Translation Task. Our system is combining rule-based machine translation technique and statistical post-editing technique. Experimental results show 0.344 BLEU score for Japanese to English intrinsic evaluation in the Patent Translation Task.
In this paper we describe the patent translation system which was submitted for the NTCIR-8 patent translation task. Our phrase-based Statistical Machine Translation (SMT) system is trained on a bilingual corpus (3 million sentence pairs) and large size monolingual corpora (460 million sentences for Japanese and 350 million sentences for English). In addition to the normal SMT, we use SVM-based reranker. According to the experimental results, our baseline system gives the high BLEU score. However, the reranker gives negative effects.
We have developed a two-stage machine translation (MT) system. The first stage is a rule-based machine translation system. The second stage is a normal statistical machine translation system. For Japanese-English machine translation, first, we used a Japanese-English rule-based MT, and we obtained "ENGLISH" sentences from Japanese sentences. Second, we used a standard statistical machine translation. This means that we translated "ENGLISH" to English machine translation. We believe this method has two advantages. One is that there are fewer unknown words. The other is that it produces structured or grammatically correct sentences. From the results of experiments, we obtained a BLEU score of 0.2565 in the Intrinsic-JE task using our proposed method. In contrast, we obtained a BLEU score of 0.2165 in the Intrinsic-JE task using a standard method (moses). And we obtained a BLEU score of 0.2602 in the Intrinsic-EJ task using our proposed method. In contrast, we obtained a BLEU score of 0.2501 in the Intrinsic-EJ task using a standard method (moses). This means that our proposed method was effective for the Intrinsic-JE and Intrinsic-EJ task. For the future study, we will try to improve the performance by optimizing parameters.
This paper proposes the POSTECHfs statistical machine translation (SMT) systems for the NTCIR-8 Patent Translation Task. We entered the translation subtasks and submitted a formal run for a Japanese-to-English (KLE-je) and a English-to-Japanese (KLE-ej) translation. The baseline system is derived from a common phrase-based SMT framework. For KLE-je, we adopted a cluster-based model using syntactic information as well as the word similarity of Japanese sentences. For KLE-ej, we adopted a reordering method to improve the fluency of translation result. We did not submit a formal run for the Patent Mining Task.
This paper describesgKYOTOhEBMT system that attended the patent translation task at NTCIR-8. When translating very different language pairs, it is very important to handle sentences in tree structures to overcome the difference. Most of the recent translation methods consider a sentence as just a sequence of words. Some works incorporate tree structures in some parts of whole translation process, but not all the way from model training (parallel sentence alignment) to decoding. gKYOTOh is a fully tree-based translation system where we use the statistical tree-based phrase alignment and example-based translation.
Factored translation model was proposed as an extended phrase-based statistical machine translation model. Effects of it were shown in many languages, however these were not shown in Japanese. We researched it in English to Japanese (EtoJ) and Japanese to English (JtoE) translation.
The evaluation of computer-produced texts is an important research problem for automatic text summarization and machine translation. Traditionally, computer-produced texts were evaluated automatically by n-gram overlap with human-produced texts. However, these methods cannot evaluate texts correctly, if the n-grams do not overlap between computer-produced and human-produced texts, even though the two texts convey the same meaning. We explore the use of paraphrases for the refinement of traditional automatic methods for text evaluation. In our previous work, we devised an evaluation method for text summarization using multiple paraphrase methods. Our goal in NTCIR-8 is to confirm the effectiveness of our method for machine translation. We evaluated 1200 computer-produced translations by six proposed methods and two baseline methods, and confirmed the effectiveness of our methods.
Identifying high-quality content in community-type Q&A (CQA) sites is important. We propose a task in which a computer identifies good answers from such sites. We describe the design of our best-answer estimation task using Yahoo! Chiebukuro. We also describe a method of constructing the test collection used for our CQA pilot task, the manual assessment method, and assessment results.
This paper describes the methods we used for evaluating the runs submitted to the NTCIR-8 Community QA Pilot Task, and report on the official results. Moreover, we also describe a set of more systematic variants of the official evaluation methods, and re-evaluate the runs. For details on the NTCIR-8 Community QA test collection and the task specifications, we refer the reader to Overview Part I . For details on the task participantsf approaches, we refer the reader to their papers [4, 6, 13].
In this paper, we describe our approaches that we used for the NTCIR-8 Community QA Pilot task and report on its results. Specifically in the pilot task, we mainly focused on discovering effective features for evaluating quality of answers, for example, features on relevance of an answer to a question, authority of an answerer, or informativeness of an answer. Also, we examined two different statistical learning approaches for finding the best quality answer. The official evaluation results of our runs showed that our proposed features and learning approaches are effective in terms of finding the best quality answers.
This paper describes a best answer estimation system called ASURA. The features of ASURA were decided on the basis of the results of experiments that were conducted to determine how people estimated which of a given set of answers was the best. There are two ASURA models, ASURA-1, which has 5 features, and ASURA-2, that has thirteen features. We outline ASURA-1 and ASURA-2 in this paper, and we also report the results from the NTCIR-8 CQA pilot task.
In this paper, I report the evaluation results of my bestanswer selection system at NTCIR-8 CQA Pilot Task. My goal was to assess what features were useful to select bestanswers using a machine learning method. I submitted three runs in the main task. They were lists of best-answer candidates which were selected by a machine learning tool using different sets of features and sorted in order of scores. I used readability of questions and answerersf attributes as features in machine learning as well as basic features, such as length of questions.