NTCIR-11 Abstract

Overview of NTCIR-11
Hideo Joho and Kazuaki Kishida
[Pdf] [Table of Content]

This is an overview of NTCIR-11, the eleventh sesquiannual workshop for the evaluation of Information Access technologies. NTCIR-11 presents the most diverse set of evaluation tasks in the history of NTCIR, led by over 40 cutting-edge researchers worldwide. This paper presents a brief history of NTCIR and overall statistics of NTCIR-11, followed by an introduction of nine evaluation tasks. We conclude the paper by discussing the future directions of NTCIR. Readers should refer to individual task overview papers for their activities and findings.
Personalized Search: Potential and Pitfalls
Susan Dumais
[Pdf] [Table of Content]

Traditionally search engines returned the same results to everyone who asks the same question. However, using a single ranking for everyone in every context limits how well a search engine can do in providing relevant information. In this talk I outline a framework to quantify the "potential for personalization” which we use to characterize the extent to which different people have different intents for a query. I will describe several examples of how we represent and use different kinds of contextual features to improve search quality for individuals. Finally I will conclude by highlighting important challenges in developing personalized systems at Web scale including system optimization, transparency, serendipity and evaluation.
Overview of the NTCIR-11 IMine Task
Yiqun Liu, Ruihua Song, Min Zhang, Zhicheng Dou, Takehiro Yamamoto, Makoto Kato, Hiroaki Ohshima and Ke Zhou
[Pdf] [Table of Content]

In this paper, we provide an overview of the NTCIR IMine task, which is a core task of NTCIR-11 and also a succeeding work of INTENT@NTCIR-9 and INTENT2@NTCIR-10 tasks. IMine is composed of a subtopic mining (SM) task, a document ranking (DR) task and a TaskMine (TM) pilot task. 21 groups from Canada, China, Germany, France, Japan, Korea, Spain, UK and United States registered to the task, which makes it one of the largest tasks in NTCIR-11. Finally, we receive 45 runs from 10 teams to the SM task, 25 runs from 6 groups to the DR task and 3 runs from 2 groups to the TM task. We describe the task details, annotation of results, evaluation strategies and then the official evaluation results for each subtask.
University of Hyogo at NTCIR-11 TaskMine by Dependency Parsing
Takayuki Yumoto
[Pdf] [Table of Content]

Our method for TaskMine consists of three steps. Firstly, we search for seed Web pages by using the query string with a word "houhou", which means "method". We collect more pages in consideration of the anchor texts in the seed pages. Then, we find pairs of chunks satisfying predefined patterns by dependency parsing on the sentences. We extract a target and a postpositional particle from a depending chunk, and extract an operation and a negation (if it exists) from a depended chunk. We regard a quadruplet of them as a subtask. Finally, we rank the extracted subtasks by their frequency based score.
Université de Montréal at the NTCIR-11 IMine Task
Arbi Bouchoucha, Jian-Yun Nie and Xiaohua Liu
[Pdf] [Table of Content]

In this paper, we describe our participation to the NTCIR-11 IMine task, for both subtopic mining and document ranking subtasks. We experimented a new approach for aspect embedding which learns query aspects by selecting (good) expansion terms from a set of resources. In our participation, we used five representative resources: ConceptNet, Wikipedia, query logs, feedback documents and query suggestions provided from Bing, Google and Yahoo!. Our method is trained in a supervised manner according to the principle that related terms should correspond to the same aspects. We tested our approach when using a single resource, and when using different resources. Experimental results show that our best document ranking run is ranked No. 2 of all 15 runs in terms of coarse-grain and fine-grain results.
FRDC at the NTCIR-11 IMine Task
Zhongguang Zheng, Shuangyong Song, Yao Meng and Jun Sun
[Pdf] [Table of Content]

The FRDC team participated in the IMine task of the NTCIR-11, including subtopic mining and document ranking subtasks for Chinese language. In the subtopic mining subtask, we propose two methods to build the two-level hierarchy subtopics. Our methods gain high F-score and H-score respectively. In the document ranking subtask, we adopt various features for relevant webpage retrieval and document ranking.
TUTA1 at the NTCIR-11 IMine Task
Hai-Tao Yu and Fuji Ren
[Pdf] [Table of Content]

In this paper, we detail our participation in two subtasks: subtopic mining and document ranking of the NTCIR-11 IMine task. In the subtopic mining subtask, to discover the latent hierarchy among query-like strings, our key idea is to structurally parse query-like strings by characterizing pairwise dependency in the bag-of-units perspective. Then the clustering algorithm (i.e., affinity propagation) and the Sainte-Lague algorithm are used to obtain the target list that represents a two-level hierarchy of subtopics. In the document ranking subtask, we deploy the newly proposed 0-1 MSKP model for diversified document ranking against unclear topics. A subset of documents are optimally chosen like filling up multiple subtopic knapsacks.
HULTECH at the NTCIR-11 IMine Task: Mining Intents with Continuous Vector Space Models
Jose G. Moreno and Gael Dias
[Pdf] [Table of Content]

In this paper, we present our participation in the Subtopic Mining subtask of the NTCIR-11 IMine task, for the English language. Our participation presents a novel strategy for intent mining given a list of candidates for a specific query topic. This strategy is based on a topic exploration through the use of continuous vector space models for each of the candidates based on classical vectorial operations. Our best run outperforms the other participants' submissions in terms of $F$-score and achieves a high position in the general ranking.
KUIDL at the NTCIR-11 IMine Task
Takehiro Yamamoto, Makoto. P. Kato, Hiroaki Ohshima and Katsumi Tanaka
[Pdf] [Table of Content]

The KUIDL team participated in the Subtopic Mining subtask of the NTCIR-11 IMine task. This paper describes our approach to generating two-level hierarchical subtopics by using Web document structures. The formal run result shows that our approach achieved the best performance in terms of H-measure in the English Subtopic Mining subtask.
THUSAM at NTCIR-11 IMine Task
Cheng Luo, Xin Li, Alisher Khodzhaev, Fei Chen, Keyang Xu, Yujie Cao, Yiqun Liu, Min Zhang and Shaoping Ma
[Pdf] [Table of Content]

This paper describes our approacher and results in NTCIR-11 IMine task. We participate in subtasks for Chinese/English Subtopic Mining and Chinese Document Ranking. In Subtopic Mining subtask, We mine subtopic candidates from various resources and construct the subtopic hierarchy with several different strategies. In Document Ranking subtask, we rerank the result lists with HITS algorithm and then adopt a pruning exhaustive search algorithm to generate diversified result lists.
SEM13 at the NTCIR-11 IMINE Task: Subtopic Mining and Document Ranking Subtasks
Md Zia Ullah and Masaki Aono
[Pdf] [Table of Content]

In this paper, we describe our participation in the Subtopic Mining subtask and Document Ranking subtask of the NTCIR-11 IMINE Task. In the Subtopic Mining subtask, we mine subtopics from query suggestions, query dimensions and freebase entities of a given query, rank them based on their importance for the given query, and finally construct a two-level hierarchy. In the Document Ranking subtask, we diversify top search results by estimating the coverage of the mined subtopics. The best performance of our system achieves an Hscore of 0.1762, a Fscore of 0.3043, a Sscore of 0.3689, and an H-measure of 0.0634 for English subtopic mining run. For document ranking run, the best performance of our system achieves a D#-nDCG@10 of 0.6022 (coarse-grain) and 0.5291 (fine-grain), which is a comparable performance to other participants.
CNU System in NTCIR-11 IMine Task
Wei Song, Wenbin Xu, Lizheng Liu and Hanshi Wang
[Pdf] [Table of Content]

Understanding user intent is important for interactive and personalized information retrieval. For ambiguous queries, user intent space actually forms a hierarchical top down architecture: from senses to subtopics, rather than a flat structure. This paper presents the CNU system in NTCIR-11 IMine task. Our method constructs the hierarchical structure by exploiting global semantic representation and expansion. The highlights include: 1) We utilize word semantic vectors which are learned using external large corpus and propose a query dependent semantic composition for representing query aspect phrases. Our target is to alleviate the term-mismatch and data sparseness problems which shallow lexical matching and co-occurrence based local semantics are ineffective to overcome. 2) We expand query subtopics by introducing new words according to global semantic relatedness and cluster these words for query sense induction. The evaluation results on and post NTCIR-11 show that: Our method could mine query subtopics and senses effectively.
The KLE's Subtopic Mining System for the NTCIR-11 IMine Task
Se-Jong Kim, Jaehun Shin and Jong-Hyeok Lee
[Pdf] [Table of Content]

This paper describes our subtopic mining system for the NTCIR-11 IMine task. We propose a method that mines second-level subtopics using simple patterns and a hierarchical structure of subtopic candidates based on sets of relevant documents, and combine the provided resources by reflecting their characteristics. Our system generates first-level subtopics using keywords in second-level subtopics, and groups the results by word correlation.
Udel @ NTCIR-11 IMine Track
Ashraf Bah, Ben Carterette and Praveen Chandar
[Pdf] [Table of Content]

This paper describes our participation in the Intent Mining track of NTCIR-11. We present our methods and results for both document ranking and subtopic mining. Our ranking methods are based on several data fusion techniques with some variations. Our subtopic mining method is a very simple technique that uses query dimensions' items to form a subtopic.
InteractiveMediaMINE at the NTCIR-11 IMine Search Task
Shohei Mine, Takuma Matsumoto, Tomofumi Yoshida, Takuya Shinohara and Daisuke Kitayama
[Pdf] [Table of Content]

The InteractiveMediaMINE team participated in the Task Mining subtask of the NTCIR-11 Imine Search Task. Our framework consists of three steps. First, we extend the query entered by the user in order to optimize to the search engine. Second, we correct candidates of task from Yahoo! Chiebukuro with the extended search term. In this time, we use the top 10 pages in search results. Finally, we calculate score of extracted tasks by word's frequency included in each sentence, our system output tasks in descending order by score. This report describes our approach to solving the Task Mine problem and discusses the results.
NTCIR-11 Math-2 Task Overview
Akiko Aizawa, Michael Kohlhase, Iadh Ounis and Moritz Schubotz
[Pdf] [Table of Content]

This paper presents an overview of the NTCIR-11 Math-2 Task, which is speci cally dedicated to information access to mathematical content. In particular, the paper summarizes the task design, analysis of the submitted runs, and the main approaches deployed by the participating groups. It also contains an introduction to the optional free Wikipedia subtask, a newly introduced mathematical retrieval task using Wikipedia articles.
ICST Math Retrieval System for NTCIR-11 Math-2 Task
Liangcai Gao, Yuehan Wang, Leipeng Hao and Zhi Tang
[Pdf] [Table of Content]

In NTCIR-11, the NTCIR-Math-2 Task is organized for mathematical information retrieval. This paper proposes an innovative system for efficient formula index and retrieval. We build a novel indexing and matching model, taking both textual and spatial similarities into consideration. Besides, a hierarchical technique is introduced to generate sub-trees from the semi-operator trees of formulae. The experimental results demonstrate that the method of our system is effective and promising in practical application.
QUALIBETA at the NTCIR-11 Math 2 Task: An Attempt to Query Math Collections
José Maria González Pinto, Simon Barthel and Wolf-Tilo Balke
[Pdf] [Table of Content]

This project introduces our first attempt to mathematical retrieval of formulae from a large collection for the NTCIR-11 Math 2 task. Our approach combined a feature-extracted sequence mechanism of the formulae and a sentence level representation of the text describing the formulae to model the collection. The feature-extracted sequences used were: the category of the formulae, the sets of identifiers, constants, and operators. This representation with the text surrounding the formulae were indexed in Elastic Search for query processing. Even though our information extraction model results are below the average’s participants and our expectations, the experience will help us to improve our work in several directions.
Evaluation of Similarity-Measure Factors for Formulae Based on the NTCIR-11 Math Task
Moritz Schubotz, Abdou Youssef, Volker Markl, Howard S. Cohl and Jimmy J. Li
[Pdf] [Table of Content]

In this paper we evaluate the similarity-measure factors proposed by Zhang and Youssef based on the NTCIR-11 gold standard. In contrast to Zhang and Youssef we evaluate them individually. The evaluation indicates that four of five factors are relevant. The fifth factor alone is of lower relevance than the other four factors. However, we do not prove that the fifth factor is irrelevant.
MathWebSearch at NTCIR-11
Radu Hambasan, Michael Kohlhase and Corneliu Prodescu
[Pdf] [Table of Content]

We present and analyze the results of the MATHWEBSEARCH sys- tem in the NTCIR-11 Math task, a challenge in mathematical information retrieval. MATHWEBSEARCH is a content-based search engine that focuses on fast query answering for interactive applications. It com- bines powerful exact formula matching based on substitution indexing with the full-text search capabilities of ElasticSearch to achieve simultaneous text and formula search. In this paper, we describe our system, evaluate our submission and results for the NTCIR-11 Math task and conclude with future work suggested by the task results.
The MCAT Math Retrieval System for NTCIR-11 Math Track
Giovanni Yoko Kristianto, Goran Topić, Florence Ho and Akiko Aizawa
[Pdf] [Table of Content]

This paper describes the participation of our MCAT search system in the NTCIR-11 Math-2 Task. The purpose of this task is to search mathematical expressions using hybrid queries containing both formulae and keywords. We introduce an encoding technique to capture the structure and content of the mathematical expressions. Each expression is accompanied by two types of automatically extracted textual information, namely words in context window and descriptions. In addition, we examine the improvement in ranking obtained by utilizing dependency graph of mathematical expressions and post-retrieval reranking method. The results show that the use of description and dependency graph together delivers better ranking performances than the use of context window. Furthermore, Using both the description and context window together delivers even better results. The evaluation results also indicate that our reranking method is effective for improving the ranking performances.
Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy
Michal Růžička, Petr Sojka and Martin Líška
[Pdf] [Table of Content]

This paper describes and summarizes experiences of Masaryk University team MIRMU with the mathematical search performed for the NTCIR pilot Math Task. Our approach is the similarity search based on MathML Canonicalization and second generation of scalable full text search engine Math Indexer and Searcher (MIaS) with attested state-of-the-art information retrieval techniques. The capability of MIaS system in terms of the math query notation, normalization, combining math with textual query tokens was deployed by submitting multiple runs with four query notations provided, and with results merged from multiple queries. The analysis of the evaluation results shows that the system performs best using TeX queries that are translated and canonicalized to Content MathML.
Combining TF-IDF Text Retrieval with an Inverted Index over Symbol Pairs in Math Expressions: The Tangent Math Search Engine at NTCIR 2014
Nidhin Pattaniyil and Richard Zanibbi
[Pdf] [Table of Content]

We report on the system design and NTCIR-Math-2 task results for the Tangent math-aware search engine. Tangent uses a federated search over two indices: 1) a TF-IDF textual search engine (Lucene), and 2) a query-by-expression engine. Query-by-expression is performed using a bag-of-words approach where expressions are represented by pairs of symbols computed from symbol layout trees (e.g. as expressed in LATEX or Presentation MathML). Extensions to support matrices and prefix subscripts and superscripts are described. Our system produced the highest highly + partially relevant Precision@5 result for the main text/math query task (92%), and the highest Top-1 specific-item recall for the Wikipedia query-by-expression subtask (68%). The current implementation is slow and produces large indices for large corpora, but we believe this can be ameliorated. Source code for our system is publicly available.
TUW-IMP at the NTCIR-11 Math-2
Aldo Lipani, Linda Andersson, Florina Piroi, Mihai Lupu and Allan Hanbury
[Pdf] [Table of Content]

The TUW-IMP team participated in the NTCIR-11 Math-2 task for retrieving mathematical formulae in scienti?c documents. This report describes our approach to solving the given math retrieval problem.
Overview of the NTCIR-11 MedNLP-2 Task
Eiji Aramaki, Mizuki Morita, Yoshinobu Kano and Tomoko Ohkuma
[Pdf] [Table of Content]

Electronic medical records are now often replacing paper documents, and thus the importance of information processing in medical fields has increased. We have already organized the NTCIR-10 MedNLP pilot task. It has been the very first shared task attempt to evaluate technologies to retrieve important information from medical reports written in Japanese, whereas the NTCIR-11 MedNLP-2 task has been designed for more advanced and practical use for the medical fields. This task was consisted of three sub tasks: (Task 1) the task to extract disease names and dates, (Task 2) the task to add ICD-10 code to disease names, (Task 3) free task. Ten groups (24 systems) participated in Task 1, 9 groups (19 systems) participated in Task 2, and 2 groups (2systems) participated in Task 3. This report is to present results of these groups with discussions that are to clarify the issues to be resolved in medical natural language processing fields.
kyoto: Kyoto University Baseline at the NTCIR-11 MedNLP-2 Task
Tetsuaki Nakamura, Kay Kubo, Shuko Shikata, Mai Miyabe and Eiji Aramaki
[Pdf] [Table of Content]

Since more electronic records are now used at medical scenes, the importance of technical development for analyzing such electronically provided information has been increasing significantly. This NTCIR-11 MedNLP-2 Task is designed to meet this situation. This task is a shared task that evaluates natural language processing technologies especially on Japanese medical texts. The task has three subtasks: (1) the Extraction task, which is to recognize complaints and diagnoses in medical texts; (2) the Normalization task, which is the ICD-coding task for complaint and diagnosis in the texts; (3) free task. This paper is the report on our results. For the Extraction task, we used a standard named entity recognition technique that is based on conditional random fields. For the normalization task, we used the string similarity between the input term and the MEDIS ICD-10 dictionary. For the free task, we proposed to design a glossary of medical terms for patients. The experimental results in the Extraction task showed reasonably high performance (precision: 77.10%, recall: 17.74%, F-measure: 58.97). However, the results in the Normalization task showed low performance (precision: 33.69%, recall: 33.69%, F-measure: 33.69). Finally, we show an example of the glossary described above as the result of the free task.
HokuMed in NTCIR-11 MedNLP-2: Automatic Extraction of Medical Complaints from Japanese Health Records Using Machine Learning and Rule-based Methods
Magnus Ahltorp, Hideyuki Tanushi, Shiho Kitajima, Maria Skeppstedt, Rafal Rzepka and Kenji Araki
[Pdf] [Table of Content]

A conditional random fields model was trained to detect medical complaints in constructed Japanese health record text. Tokenisation was applied by using the dependency parser CaboCha and the conditional random fields model was trained on tokens in a window size of two preceding and three following tokens, as well as on part-of-speech, vocabulary mapping, header name, frequent suffix, orthography and presence of a modality cue. Modality detection relied on dictionaries of cues for negation, suspicion and family, and a complaint was classified as negated if it was within the scope of a negation cue, as a suspicion if it was within the scope of a suspicion cue and as related to a family member if it was within the scope of a family cue. The scope of negation and suspicion cues was determined by rules relying on the output of CaboCha. For negation and family, cues were gathered by scanning the development corpus for cues, while suspicion cues were obtained by translating English cues. The best result achieved for recognising complaints was a precision of 87% and a recall of 77%. For modality detection on the development set, positive was detected with a precision of 87% and a recall of 77%, negation with a precision of 76% and a recall of 69%, suspicion with a precision 49% and a recall of 51%, and family with a precision of 78% and a recall of 81%.
BARY at the NTCIR-11 MedNLP-2 Task for Complaints and Diagnosis Recognition
Yusuke Matsubara, Mizuki Morita and Koiti Hasida
[Pdf] [Table of Content]

This paper describes a machine-learning based approach to recognizing diagnosed disease names and corresponding temporal expressions. Using CRFs (conditional random fields) to learn and predict tags, the systems described in this paper are characterized by a character-level formulation and heuristic features extracted from medical terminologies. Experimental results on the NTCIR-11 MedNLP-2 datasets suggest that the approach effectively exploit terminological resources and combine them with other NLP (natural language processing) resources including morphological analyzers.
SCT-D3 at the NTCIR-11 MedNLP-2 Task
Akinori Fujino, Jun Suzuki, Tsutomu Hirao, Hisashi Kurasawa and Katsuyoshi Hayashi
[Pdf] [Table of Content]

The SCT-D3 team participated in the Extraction of Complaint and Diagnosis subtask and the Normalization of Complaint and Diagnosis subtask of the NTCIR-11 MedNLP-2 Task. We tackled the two subtasks by using machine learning techniques and additional medical term dictionaries. This report outlines the methods we used to obtain our experimental results, and describes our practical evaluation.
NCU IISR System for NTCIR-11 MedNLP-2 Task
Sheng-Wei Chen, Po-Ting Lai, Yi-Lin Tsai, Jay Kuan-Chieh Chung, Sherry Shih-Huan Hsiao and Richard Tzong-Han Tsai
[Pdf] [Table of Content]

This paper describes NCU IISR's Japanese ICD-10 Code Linking system for NTCIR-11 MedNLP. Our system uses Conditional Random Fields (CRFs) to label ICD-10 mentions and temporal expressions. We also use CRFs to detect the modalities of the ICD-10 mentions. To resolve the problem of ICD-10 mention normalization, we use the Lucene engine to link mentions to the corresponding ICD-10 database entries. Evaluated on the MedNLP test set, our system achieved f-scores of 79.96% for ICD-10 term recognition, 67.64% for time expression and 69.4% for ICD-10 mention normalization.
Incorporating Unsupervised Features into CRF based Named Entity Recognition
Yuki Tawara, Mai Omura and Mirai Miura
[Pdf] [Table of Content]

We participated in the extraction of complaint and diagnosis Task and the normalization of complaint and diagnosis Task of MedNLP2 in NTCIR11. In the extraction Task, we use CRF based Named Entity Recognition method. Moreover, we incorporate unsupervised features learned from raw corpus into CRF. We show such unsupervised features improve system performance.
HCRL at NTCIR-11 MedNLP-2 Task
Osamu Imaichi, Masakazu Fujio, Toshihiko Yanase and Yoshiki Niwa
[Pdf] [Table of Content]

This year's MedNLP-2 [1] has two tasks: Extraction task (Task 1) and Normalization task (Task 2). We tested both machine learning based methods and an ad-hoc rule-based method for the two tasks. For the Extraction Task, a two-stage approach (first, the machine learning based method is applied to identify c tags, and second, the rule-based method is applied to modality features) obtained higher results. For the Normalization Task, the machine learning based method obtained higher results for training data, but the simple pattern-matching method obtained higher results for test data.
Preliminary Report of III&CYUT for NTCIR-11 MedNLP-2
Liang-Pu Chen, Hsiang Lun Lin, Yan Shen Lai and Ping-Che Yang
[Pdf] [Table of Content]

We construct a supervised learning system to participate MedNLP2 task in NTCIR-11 that find the keyword out correctly at right position and normalize to identify unique id in ICD10 [4]. In our system, We pick part-of-speech tagging (POS) [1] as feature to train machine learning models based on Conditional Random Fields (CRF) [3] for named entities extraction, then construct a hierarchical classifier to determine ICD code of the terms.
Technical Report of Uni2014 in NTCIR-11 MedNLP-2
Kenta Fukuda
[Pdf] [Table of Content]

This paper describes approach and evaluation using CRFs and dictionary matching in Task1 (Extraction of complaint and diagnosis Task) and dictionary matching in Task2 (Normalization of complaint and diagnosis Task).
The OKPU System in NTCIR11 MedNLP2: An IR Approach to ICD-10 Code Identification
Genichiro Kikui and Yasuhiro Tajima
[Pdf] [Table of Content]

This paper describes an IR (Information Retrieval) approach to identifying the ICD-10 code of a medical term, such as a disease name or a description of a symptom or a complaint), in a medical text. In this approach, we prepare a dictionary of disease names, each paired with a corresponding ICD-10 code(s). The system searches for the disease name most relevant to the input, and returns the ICD-10 code paired with the disease name in the dictionary. In IR terms, disease name in the dictionary can be regarded as a document and an input medical term as a query. In order to handle an input which does not exactly match with any disease names in the database, we introduce two kinds of partial matching and a context search, where a query includes context words of the input term. Preliminary evaluation for the MedNLP2 test set shows that with this simple approach our system correctly identified 54% of the input medical terms.
A NLP System of DCUMT in NTCIR-11 MedNLP-2: RNN for ICD/Time Entity Recognition and ICD Classification Tasks
Tsuyoshi Okita and Qun Liu
[Pdf] [Table of Content]

This paper describes the medical NLP system developed at Dublin City University for participation in the Second Medical NLP Shared Task (MedNLP 2) in NTCIR-11 [1]. This shared task is a Japanese task. Our system detects International Classification of Diseases (ICD) and time entities and classifies ICD entities. We participated in the task 1 which detects the ICD and time entities, and the task 2 which classifies the detected ICD entities among the ICD codes. Our system uses deep learning to learn and classify those entities. Our result was F1 score of 67.8 for the ICD entity recognition task (task 1), 77.4 for the time entity recognition task (task 1), and 54.0 for the ICD classification task (task 2 for gold standard).
Overview of the NTCIR-11 MobileClick Task
Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto and Mayu Iwata
[Pdf] [Table of Content]

This is an overview of the NTCIR-11 MobileClick task (a sequel to 1CLICK in NTCIR-9 and NTCIR-10). In the MobileClick task, systems are expected to output a concise summary of information relevant to a given query and to provide immediate and direct information access for mobile users. We designed two types of MobileClick subtasks, namely, iUnit retrieval and summarization subtasks, in which four research teams participated and submitted 14 runs. We describe the subtasks, test collection, and evaluation methods and then report official results for NTCIR-11 MobileClick.
Improving iUnit Retrieval with Query Classification and Multi-Aspect iUnit Scoring: The IISR System at NTCIR-11 MobileClick Task
Chia-Tien Chang, Yu-Hsuan Wu, Yi-Lin Tsai and Richard Tzong-Han Tsai
[Pdf] [Table of Content]

This paper describes our approach to the NTCIR-11 MobileClick task. Based on the assumption that different user intentions should be handled by different extraction/retrieval strategies, we first classify each query into one of our eight defined query types and set the weights of the extraction methods accordingly. Next, we extract the relevant parts of the search results and rank the extracted sentences. Finally, we apply a rule-based approach to iUnit extraction. Our system achieves an nDCG@10 score of 0.2134 and a Q@10 score of 0.1573, outperforming the baseline by 23.9% and 42.4%, respectively. This difference demonstrates the effectiveness of our query classification and multi-aspect iUnit scoring.
KPNM at the NTCIR-11 MobileClick Task
Dong Zhou, Zhao Wang, Ziyi Zeng and Tuo Peng
[Pdf] [Table of Content]

This paper describes KPNM’s (Knowledge Processing and Networked Manufacturing lab at Hunan University of Science and Technology) participation in the Mobile Information Access ("MobileClick") task at the NTCIR-11. We chained simple techniques based on statistical models and heuristic rules to extract significant text units from the web pages retrieved by the given query. Then we ranked the text units by using the vector space model. The text units are regarded as iunits as the final results. This is our first attempt in this task. Due to the limited human resources and short of time, more measures should be considered in the future for generating the iunits particularly for use in the mobile devices.
Udel @ NTCIR-11 MobileClick Track
Ashraf Bah Rabiou and Ben Carterette
[Pdf] [Table of Content]

This paper describes our participation in the MobileClick track of NTCIR-11. We present our methods and results for both iUnit retrieval and summarization. Our ranking methods for the retrieval task are essentially two-step methods in which we first create a model pseudo-nugget, and then promote iUnits that are the most similar to that model pseudo-nugget. Our summarization methods consist in simply concatenating the iUnits (parts of sentences) that we already ranked in our retrieval sub-task.
Description of the NTOU MobileClick System at NTCIR-11
Chi-Ting Liu and Chuan-Jie Lin
[Pdf] [Table of Content]

This paper describes the design of NTOU's first MobileClick system participating in two NTCIR-11 MobileClick English subasks, iUnit Retrieval and iUnit Summarization. Our iUnit retrieval module first used inverted query frequency (iqf) to extract topic-related keywords, and then identify important nuggets by measuring and sorting nf.iqf scores, where nf is nugget frequency. Summarization module is a greedy clustering system according to the lengths and sizes of common leading substrings among iUnits. Our iUnit Retrieval formal run did not perform well, where nDCG@10 score is 0.1426 and Q@10 is 0.0803. But our iUnit Summarization formal run was ranked at the first place, where M-measure score is 4.43 at the patience parameter L=280.
Overview of the NTCIR-11 Recognizing Inference in TExt and Validation (RITE-VAL) Task
Suguru Matsuyoshi, Yusuke Miyao, Tomohide Shibata, Chuan-Jie Lin, Cheng-Wei Shih, Yotaro Watanabe and Teruko Mitamura
[Pdf] [Table of Content]

This paper describes an overview of Recognizing Inference in TExt and Validation (RITE-VAL) task in NTCIR-11. We evaluated systems that automatically recognize semantic relations between sentences such as entailment, contradiction and independence in Japanese (JA), English (EN), Simplified Chinese (CS) and Traditional Chinese (CT). RITE-VAL task has the following two subtasks: Fact Validation subtask (FV) and System Validation subtask (SV). SV consists of binary classification subtask (SVBC) and multi-classification subtask (SVMC).We had 23 active participating teams, and received 170 formal runs (59 Japanese runs, 9 English runs, 53 Simplified Chinese runs and 49 Traditional Chinese runs). This paper also describes how the datasets for RITE-VAL had been developed, how the systems were evaluated, and reports RITE-VAL formal run results.
III&CYUT Chinese Textual Entailment Recognition System for NTCIR-11 RITE-VAL
Shih-Hung Wu, Li-Jen Hsu, Hua-Wei Lin, Pei-Kai Liao, Liang-Pu Chen and Tsun Ku
[Pdf] [Table of Content]

Textual Entailment (TE) is a critical issue in natural language processing (NLP). In this paper, we report how our hybrid approach system works in NTCIR-11 RITE-VAL task [15]. We attended both Fact Validation (FV) and System Validation (SV) subtasks for Chinese. In the SV subtask, we also attended both binary classification (BC) and multi-classification (MC). For the SV BC sub-task, our system detects eleven special cases for the input pairs, and use twelve SVM classifiers to do classification. The results then are integrated as the system report. For the SV MC sub-task, we also trained four SVM classifiers for the Bidirectional, Forward, Independence, and Contradiction. The results are integrated by rules. For the FV subtask, our system searches the Wikipedia to find the top one T1 and decided the entailment relation to T2 by rules.
Recognizing Textual Entailment Using Multiple Features and Filters
Yongmei Tan, Minda Wang, Xiaohui Wang and Xiaojie Wang
[Pdf] [Table of Content]

Textual entailment among sentences is an important part of applied semantic inference. In this paper we propose a novel technique to address the recognizing textual entailment challenge, which based on the distribution hypothesis that words that tend to occur in the same contexts tend to have similar meanings. Using the IDF of the overlapping words between the two propositions, we calculate the similarity between the two given propositions to infer the likelihood of entailment and then filter the results inferred. We evaluate our model on NTCIR-11 RITE dataset and then show how a combination of multiple features and filters can significantly improve the performance of recognizing textual entailmentover the best performers in those years. Our approach advances state-of-the-art Simplified Chinese NTCIR-11 RITE.
WUST at NTCIR-11 RITE-VAL System Validation Task
Maofu Liu, Yue Wang, Limin Wang and Huijun Hu
[Pdf] [Table of Content]

This paper describes our work in NTCIR-11 on RITE-VAL System Validation task in Simplified Chinese including Binary-class (BC) subtask and Multi-class (MC) subtask. We construct the classification model based on support vector machine to recognize semantic inference in Chinese text pair. In our system, we use multiple features including statistical feature, lexical feature and syntactic feature. For contradiction recognition, we put forward the Chinese textual contradiction approach using linguistic phenomena.
NUL System at NTCIR RITE-VAL Tasks
Ai Ishii, Hiroshi Miyashita, Mio Kobayashi and Chikara Hoshino
[Pdf] [Table of Content]

This paper describes the submitted strategy and the methods of NUL team on NTCIR-11 RITE-VAL fact validation (FV) and system validation (SV) tasks. We started to follow the shallow approach by Tian et al[2]. Then, we improved the named entity recognition accuracy and transformed some variables by the cross validation score of training sets. Especially, in the FV tasks, we used Apache Solr as the base search system. We compared several units of chunk to the texts index and the weighting of the ranking score for search results. After several modication, we achieved the highest cross validation score to the RITE-10 Exam bc and Exam Search tasks. Our nal submitted system achieved Macro-f1 score 61.47 in FV and 69.59 in SV respectively.
KSU Team's System and Experience at the NTCIR-11 RITE-VAL Task
Tasuku Kimura and Hisashi Miyamori
[Pdf] [Table of Content]

This paper describes the systems and results of the team KSU for RITE-VAL task in NTCIR-11. Three different systems were implemented for each of the two subtasks: Fact Validation and System Validation. In Fact Validation subtask, systems were designed respectively based on character overlap, existence of entailment result 'Y', and voting of entailment results. In System Validation subtask, systems were designed respectively using SVM, Random Forest, and Bagging, with features such as surface features, numerical expressions, location expressions, and named entities. Scores of the formal runs were 52.78% in macro F1 and 66.96% in accuracy with KSU-FV-02 in Fact Validation, and 66.96% in macro F1 and 79.84% in accuracy with KSU-SV-01 in System Validation.
A Surface-Similarity Based Two-Step Classifier for RITE-VAL
Shohei Hattori and Satoshi Sato
[Pdf] [Table of Content]

This paper describes the system of the team SKL in the NTCIR-11 RITE-VAL workshop. The system is a modified version of our system in the previous workshop, which takes the two-step classification strategy. The first step classifies a given text pair into positive or negative entailment class based on an overlap measure. If the pair is classified into positive class, the second step examines whether the assigned class should be flipped or not by using heuristic rules that detect the mismatch of named entities and numbers. For the Fact Validation subtask that is newly introduced in this workshop, we have attached a new module called text-search and re-examined the best overlap measure.
KitAi-VAL: Textual Entailment Recognition System for NTCIR-11 RITE-VAL
Ayaka Morimoto, Kenta Kurashima, Yo Tokunaga and Kazutaka Shimada
[Pdf] [Table of Content]

This paper describes Japanese textual entailment recognition systems for NTCIR-11 RITE-VAL. The tasks that we participated in are the system validation subtask and the fact validation subtask for Japanese. Our methods for the system validation are based on our previous method KitAi for RITE2. We add new features to the previous method. In addition, we construct a combined classifier for the unit-test, which is a sentence pair, t1 and t2, about a single linguistic phenomenon. For the fact validation task, we propose two approaches; search log based and summarization based methods. The search log based method generates a classifier using logs from Apache Solr. It does not contain any linguistic features for the classifier. The summarization based method generates t1 from outputs of Apache Solr. It is a kind of multi-document summarization. We apply the generated t1 to KitAi, namely a classifier for the binary class problem of textual entailment recognition. In formal runs, the best accuracy rates in the methods for the system validation and the fact validation tasks were 68.02 points and 57.98, respectively.
KNDTE: A System for Textual Entailment and Fact Validation Tasks at the NTCIR-11 RITE-VAL
Tao-Hsing Chang, Shih-Kuei Kao, Yen-Wen Lin, Li-Jia Wei and Chung-Yin Tsai
[Pdf] [Table of Content]

Using a decision tree as the predictive model, two algorithms are proposed in this paper: one for detecting textual entailment relationships, and the other for confirming textual fact. Features proposed by previous studies are improved upon, and new features are introduced. This enhances the effectiveness of the proposed method, which was tested using the RITE-VAL task of NTCIR 11. The proposed method predicts entailment relationships between Chinese textual pairs in the data set of the system validation (SV) subtask. The predictions are then verified for accuracy using sentences from the data set of the fact validation (FV) subtask. When applied to the binary-class (BC) and multi-class (MC) tasks of the SV subtasks, the average macro-F1 rates of the proposed method are 54.1% and 39.08%, respectively. For the FV subtask, the average macro-F1 rate of the proposed method is 33.97%.
Recognizing Textual Entailment Using Lexical, Syntactical, and Semantic Information
Yann-Huei Lee, Shafqat Virk and Lun-Wei Ku
[Pdf] [Table of Content]

This paper describes our system for participating the system validation subtask in NTCIR-11 RITE-VAL. We trained a SVM model with LibSVM by features extracted from labeled sentence pairs. Besides features based on lexical, syntactic and semantic analysis, we introduce a novel approach of extracting "concepts" from a sentence and generate features based on it. Unlabeled testing sentence pairs' features are extracted through the same process, and the SVM model we've trained predicts their labels.
Discriminating between Relevant and Irrelevant Text for Fact Validation
Mirai Miura, Hiroki Ouchi, Mai Omura, Mayo Yamasaki and Akifumi Yoshimoto
[Pdf] [Table of Content]

The CL team paticipated in the Fact Validation (FV) and System Validation (SV) subtasks in Japanese. This paper describes our systems with experimental results. In the Fact Validation subtask, a system is required to search the given documents for texts (t1 ) and judge the fact validity of the given statement (t2 ) based on the judgement of whether t1 entails t2 or not. However, if t1 selected by the system is irrelevant to t2 , existing RTE approaches do not work well for the validity judgement. Thus, it is a key to the accurate judgement of the fact validity how to search for and select relevant t1 . Our approach first discriminates between relevant and irrelevant t1 based on the score computed by a search engine, TSUBAKI, and then adopts different methods of judging the fact validity for each t2 . If the system regards t1 as relevant, a simple binary classification method is adopted to judge the validity. On the other hand, if the system regard t1 as irrelevant, a full-text search engine, Solr, is used to compute retrieval scores different from the ones computed by TSUBAKI. These retrieval scores are used as features for the binary classification. The experiments show that our approach is effective for the fact validation.
NWNU Minimum Information Recognizing Entailment System for NTCIR-11 RITE-3 Task
Zhichang Zhang, Dongren Yao, Longlong Mao and Songyi Chen
[Pdf] [Table of Content]

This paper describes our work in NTCIR-11 on RITE-3 Binary-class (BC) subtask and Multi-class (MC) subtask in Simplified Chinese. We proposed a textual entailment system using a hybrid approach that integrates many features. The performance of the proposed method in the formal run achieved Macro-F1's of 59.71% in BC subtask and only 23.19% in MC subtask.
Experiments for NTCIR-11 RITE-VAL Task at Shibaura Institute of Technology
Toru Sugimoto, Toshiki Mizukoshi and Ryosuke Masuda
[Pdf] [Table of Content]

This paper reports the evaluation results of our textual entailment system at NTCIR-11 RITE-VAL task. We participated in the Japanese System Validation (SV) and Fact Validation (FV) subtasks. In our system, the meaning of a text is represented as a set of dependency triples consisting of two words and their relation. Comparing two sets of dependency triples with respect to conceptual similarity, a subsumption score is calculated and used to identify textual entailment. This paper provides a description of our algorithm, the evaluation results and discussion on the results.
Answering Yes-No Questions by Keyword Distribution: KJP System at NTCIR-11 RITEVal Task
Yoshinobu Kano
[Pdf] [Table of Content]

Textual entailment is normally regarded as a deeper analysis among other NLP techniques. In contrast to such approaches, we used a simple, but fundamentally important, keyword based technique. This system architecture was built on our observations that many of textual entailment issues are knowledge search issues, and extracted keyword distribution is the inevitable fundamental problem to solve regardless of employed methods. Our team obtained the second best rank among participated teams in the Japanese Fact Validation subtask (JA-FV) in the NTCIR-11 RITEVal task.
Yamraj: Binary-class and Multi-class Based Textual Entailment System for Japanese (JA) and Chinese Simplified (CS)
Partha Pakray
[Pdf] [Table of Content]

The article presents the experiments car-ried out as part of the participation in Recognizing Inference in TExt and Validation (RITE-VAL) at NTCIR-11 for Japanese. RITE-VAL has two subtasks i.e. Fact Validation and System Validation subtask for Chinese-Simplified (CS), Chinese-Traditional (CT), English (EN), and Japanese (JA) and semantic relation between two texts such as entailment, contradiction, and independence. We have submitted run for Japanese (JA) System Validation (one run BC and one for MC), Chinese Simplified (CS) System Validation (one run). The Textual Entailment system used the web based Google translator system for Machine Translation purpose. The system is based on Support Vector Machine that uses features from lexical similarity, lexical distance, and syntactic similarity.
Description of the NTOU RITE-VAL System at NTCIR-11
Chuan-Jie Lin, Chi-Ting Liu and Yu-Cheng Tu
[Pdf] [Table of Content]

System validation subtask in NTCIR aims at developing techniques to deal with many kinds of language phenomena about textual entailment. This paper introduces our system participating in NTCIR-11 RITE-VAL SV Subtask. By adopting different combination of features related to WordNet, Tongyici Cilin, and syntactic information, 5 SV-BC and 5 SV-MC formal runs were submitted. The best BC run achieved 42.89% in macro F-measure and 52.33% in accuracy. The best MC run achieved 31.03% in macro F-measure and 39.17% in accuracy.
The WHUTE System in NTCIR-11 RITE-VAL Task
Han Ren, Hongmiao Wu, Xiwen Tan, Pengyuan Wang, Donghong Ji and Jing Wan
[Pdf] [Table of Content]

This paper describes our system of recognizing textual entailment for RITEVAL System Validation and Fact Validation subtasks at NTCIR-11. For System Validation subtask, we employ a transformation model and acquire entailment rules by extracting synonyms and inferable expressions from resources such as lexicons and knowledge bases. Also, a cascaded entailment recognition model is employed to recognize four types of entailment relations. For Fact Validation subtask, we build a pipeline approach to find texts that entails given texts. First, a retrieval model is used to search related sentences from Wikipedia documents provided, then we used the recognition model in System Validation subtask to find such sentences that entailed the given texts. Official results show that our system achieves a performance of 53.48% MacroF1 score in Chinese SVBC subtask, a 25.74% MacroF1 score in Chinese SVMC subtask, a 45.51% MacroF1 score in English FV subtask and a 38.08% MacroF1 score in Chinese FV subtask.
BnO at the NTCIR-11 English Fact Validation Task
Pascual Martínez-Gómez, Ran Tian and Yusuke Miyao
[Pdf] [Table of Content]

This paper describes the submission of BnO team to the RITE-VAL Fact Validation task [Matsuyoshi et al. 2014] for English in NTCIR-11. In this submission, BnO team made use of search results retrieved by the search engine TSUBAKI as text T side of textual entailment pairs. Then, we used a logical algebraic inference system developed in [Tian et al. 2014] to test whether or not an entailment relation exists between the sentences retrieved by TSUBAKI and the hypotheses. We also tested a classifier based on Random Forests that used the output from the inference engine and other features related to TSUBAKI search results.
MCU at NTCIR: Chinese Fact Validation via SVM Cotext Ranking
Yu-Chieh Wu, Tzu-Yu Liu, Yue-Shi Lee and Jie-Chi Yang
[Pdf] [Table of Content]

Validate factoid description in text is the subtask of finding the textual entailment relation between the given hypothesis and unlabeled raw corpus. By means of integrating multiple natural language processing units, higher performance could be reasonably achieved. In this paper, we propose a context ranking model-based and trainable framework under the condition of part-of-speech tagging information is available. We first revise in-house word segmentation method via auto-deriving thesaurus from Wiki. Then a language-model-based passage retriever is used to find the initial retrieval result. The context ranking model is then extracting features and re-ranks the result. The official results indicate the effectiveness of our method. In terms of accuracy, our method achieves 39.27% for Traditional Chinese FV task (second place).
NAK Team's System for Recognizing Textual Entailment at the NTCIR-11 RITE-VAL Task
Genki Teranaka, Masahiko Sunohara and Hiroaki Saito
[Pdf] [Table of Content]

The NAK team participated in the NTCIR-11 RITE-VAL task. This paper describes our textual entailment system and discusses the official results. Our system adopts statistical method: classification of the support vector machine (SVM). For Japanese SV subtask, our best result was 63.19 for macro-F1 score and 74.55 for accuracy. For Japanese FV subtask, our best result was 53.07 for macro-F1 score and 60.82 for accuracy.
MIG at NTCIR-11: Using Lexical, Syntactic, and Semantic Features for the RITE-VAL Tasks
Po-Cheng Lin, Shu Yu Lin, Chih Kai Haung and Chao-Lin Liu
[Pdf] [Table of Content]

In this paper, we describe our methods for the English and Chinese RITE-VAL tasks. We extracted relevant sentences from Wikipedia to verify the correctness of the query statements. Computational models that considered various linguistic features were built to select Wikipedia articles that contained these relevant sentences. We adopt Linearly Weighted Functions (LWFs) to balance the importance of every features and judge the answer of each query statement by the outputs of LWFs.
IMTKU Textual Entailment System for Recognizing Inference in Text at NTCIR-11 RITE-VAL
Min-Yuh Day, Ya-Jung Wang, Che-Wei Hsu, En-Chun Tu, Shang-Yu Wu, Huai-Wen Hsu, Yu-An Lin, Yu-Hsuan Tai and Cheng-Chia Tsai
[Pdf] [Table of Content]

In this paper, we describe the IMTKU (Information Management at TamKang University) textual entailment system for recognizing inference in text at NTCIR-11 RITE-VAL (Recognizing Inference in Text). We proposed a textual entailment system using statistics approach that integrate semantic features and machine learning techniques for recognizing inference in text at NTCIR-11 RITE-VAL task. We submitted 3 official runs for BC, MC subtask. In NTCIR-11 RITE-VAL task, IMTKU team achieved 0.2911 in the CT-MC subtask, 0.5275 in the CT-BC subtask; 0.2917 in the CS-MC subtask, 0.5325 in the CS-BC subtask.
KTU System for NTCIR-11 RITE-VAL Task
Tomohide Shibata
[Pdf] [Table of Content]

This paper describes KTU system for NTCIR-11 RITE-VAL Japanese Tasks. The proposed method regards predicate-argument structure as a basic unit of handling the meaning of text/hypothesis, and performs the matching between text and hypothesis. The system first performs predicate-argument structure analysis to both a text and a hypothesis. Then, we perform the matching between text and hypothesis. In matching text and hypothesis, wide-coverage relations between words/phrases such as synonym and is-a are utilized, which are automatically acquired from a dictionary, Web corpus and Wikipedia.
Overview of the NTCIR-11 SpokenQuery&Doc Task
Tomoyosi Akiba, Hiromitsu Nishizaki, Hiroaki Nanjo and Gareth J. F. Jones
[Pdf] [Table of Content]

This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery\&Doc) task at the NTCIR-11 Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) as the main sub-task. With a spoken query driven spoken term detection task (SQ-STD) as an additional sub-task. The paper describes details of each sub-task, the data used, the creation of the speech recognition systems used to create the transcripts, the design of the retrieval test collections, the metrics used to evaluate the sub-tasks and a summary of the results of submissions by the task participants.
Spoken Document Retrieval Experiments for SpokenQuery&Doc at Ryukoku University (RYSDT)
Hiroaki Nanjo, Takehiko Yoshimi, Sho Maeda and Tomohiro Nishio
[Pdf] [Table of Content]

In this paper, we describe spoken document retrieval (SDR) systems in Ryukoku University, which were participated in NTCIR-11 "SpokenQuery&Doc'" task. In NTCIR-11 SpokenQuery&Doc task, there are subtasks: "spoken content retrieval (SCR) subtask''. and ``spoken term detection (STD) subtask''. We participated in the SCR and STD subtasks as team RYSDT. In this paper, our SDR and STD systems are described.
Spoken Term Detection and Spoken Content Retrieval: Evaluations on NTCIR 11 SpokenQuery&Doc Task
Sz-Rung Shiang, Po-Wei Chou and Lang-Chi Yu
[Pdf] [Table of Content]

In this paper, we report out experiments on NTCIR-11 SpokenDoc&Query task for spoken term detection (STD) and spoken content retrieval (SCR). In STD, we consider acoustic feature similarity between utterances over both word and sub-word lattices to deal with the general problem of open vocabulary retrieval with queries of variable length. In SCR, we modify term frequency using expected term frequency in the vector space model (VSM) to deal with the errors in the speech recognition. In addition, we utilize three techniques to improve the relevance of the first-pass retrieval, that is, pseudo relevance feedback called Rocchio algorithm, query expansion using recurrent neural network language model (RNNLM), and lecture slide similarity feedback using random walk. Experiment results are shown for each task to indicate the improvement of the techniques we apply.
DCU at the NTCIR-11 SpokenQuery&Doc Task
David N. Racca and Gareth J.F. Jones
[Pdf] [Table of Content]

We describe DCU’s participation in the NTCIR-11 SpokenQuery&Document task. We participated in the spoken-query spoken content retrieval (SQ-SCR) subtask by using the slide group segments as basic indexing and retrieval units. Our approach integrates normalised prosodic features into a standard BM25 weighting function to increase weights for terms that are prominent in speech. Text queries and relevance assessment data from the NTCIR-10 SpokenDoc-2 passage retrieval task were used to train the prosodic-based models. Evaluation results indicate that our prosodic-based retrieval models do not provide significant improvements over a text-based BM25 model, but suggest that they can be useful for certain queries.
An IWAPU STD System for OOV Query Terms and Spoken Queries
Jinki Takahashi, Takumi Hashimoto, Ryota Kon'no, Shota Sugawara, Kazuki Ouchi, Satoshi Oshima, Takahiro Akyu and Yoshiaki Itoh
[Pdf] [Table of Content]

We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models. In this paper, we describe two methods for text OOV query terms and spoken queries. For text OOV query terms, we introduce four unique methods. First, we integrate multiple retrieval results obtained from multiple subword recognition. Second, we use Deep Neural Network (DNN) for computing output probabilities of Hidden Markov Models (HMM). Third, we apply a re-ranking method utilizing highly ranked candidates. Fourth, DNN is also used for re-ranking for the retrieval results containing organizer's results. For spoken queries, we use speech recognition results of several speech recognizers including our word-based HMM and DNN-HMM recognizer, our syllable-based HMM and DNN-HMM recognizer and google voice. A few retrieval results obtained by each recognizer are combined. In STD tasks (SDPWS) of IR for Spoken Documents in NTCIR-11, we submit 15 types of retrieval results. For text query terms, we use transcriptions of our various speech recognizers, only an organizer's transcription, both of our transcriptions and an organizer's transcription, and so on. For spoken queries, we also use the same transcriptions of spoken documents, as mentioned above. We also submit a run using organizer's transcriptions for spoken documents, followed by our re-ranking methods.
STD Method Based on Hash Function for NTCIR11 SpokenQuery&Doc Task
Satoru Tsuge, Norihide Kitaoka, Kazuya Takeda and Kenji Kita
[Pdf] [Table of Content]

In this paper, we describe a spoken term detection (STD) which is used in Spoken Query and Documents task of NTCIR 11 meeting. Our STD method extracts sub-sequences from the target ocuments and converts them into bit sequences using the hash function. The query is also converted into a bit sequence in the same way. Candidates are detected by calculating the hamming distance between the bit sequence of the query and that of the target documents. Then, our method calculates the distances between the query and the candidates using DP (Dynamic Programming) matching. To evaluate the proposed methods, we conducted spoken document retrieval experiments using the SpokenDoc task from the NTCIR-9 meeting. Using these experimental results to set our arameters, we submitted the results for the SQ-STD (Spoken Query Spoken Term Detection) task at NTCIR-11.
Segmented Spoken Document Retrieval Using Word Co-occurrence Information
Kensuke Hara, Hiroaki Taguchi, Koudai Nakajima, Masanori Takehara, Satoshi Tamura and Satoru Hayamizu
[Pdf] [Table of Content]

This paper shows several approaches for NTCIR-11 SpokenQuery&Doc. This paper proposes several schemes to use word co-occurrence information for spoken document retrieval. Automatic ranscriptions of spoken documents usually contain mis-recognized words, making the performance of spoken document retrieval significantly decrease. The co-sine similarity to measure a document similarity must be investigated for spoken documents. It is also difficult to retrieve a segmented document having few terms. To cope with these problem, we utilize Pointwise Mutual Information (PMI). We compute a recognition confidence for each term appeared in a transcription to drop mis-recognized words. We also investigate a PMI-based document comparison approach. Furthermore, a segmented-document retrieval method is also proposed. Experiments were conducted to evaluate these methods using NTCIR-11 test sets.
Combination of DTW-based and CRF-based Spoken Term Detection on the NTCIR-11 SpokenQuery&Doc SQ-STD Subtask
Hiromitsu Nishizaki, Naoki Sawada, Satoshi Natori, Kentaro Domoto and Takehito Utsuro
[Pdf] [Table of Content]

Conventional spoken term detection (STD) techniques, which use a text-based matching approach based on automatic speech recognition (ASR) systems, are not robust for speech recognition errors. This paper proposes a conditional random fields (CRF)-based combination (re-ranking) approach, which recomputes detection scores produced by a phoneme-based dynamic time warping (DTW) STD approach. In the re-ranking approach, we tackle STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. They train recognition error patterns such as phoneme-to-phoneme confusions on the CRF framework. Therefore, the models can detect a triphone, which is one of triphones composing a query term, with detection probability. In the experimental evaluation on the NTCIR-11 SpokenQuery\&Doc SQ-STD test collection, the CRF-based approach and the combination approach of the two STD systems could not outperform the conventional DTW-based approach we have already proposed.
Utilizing Confusion Network in the STD with Suffix Array and Its Evaluation on the NTCIR-11 SpokenQuery & Doc SQ-STD Task
Kouichi Katsurada, Genki Ishihara, Kheang Seng, Yurie Iribe and Tsuneo Nitta
[Pdf] [Table of Content]

The authors have proposed a fast spoken term detection that uses a suffix array as a data structure. This method enables very quick and memory saving search by using such techniques as keyword division, dynamic time warping, and employment of articulatory-feature-based local distance definition. In this paper, we investigate a new approach that utilizes a confusion network in the suffix array. The experimental results show that this approach has both good and bad effect on the search. Although it increases the search time, it can reduce the size of the search index. The search accuracy is almost same as the original one.
Combining Subword and State-level Dissimilarity Measures for Improved Spoken Term Detection in NTCIR-11 SpokenQuery&Doc Task
Mitsuaki Makino and Atsuhiko Kai
[Pdf] [Table of Content]

In recent years, demands for distributing or searching mul- timedia contents are rapidly increasing and more effective method for multimedia information retrieval is desirable. In the studies on spoken document retrieval systems, much re- search has been presented focusing on the task of spoken term detection (STD), which locates a given search term in a large set of spoken documents. Recently, in such spoken document retrieval task, there has been increasing interest in using a spoken query to the reason the viewpoint of effi- ciency of input, language difficult to notation and so on. In this paper, we propose spoken term detection method us- ing multiple scoring and dissimilarity measures for spoken query. Our proposed method is intended to convert the spo- ken query into a syllable sequence by LVCSR and do search that takes into account the acoustic dissimilarity on spoken documents' LVCSR transcripts. The experimental results showed that our proposed system improve the performance compared to baseline system.
Sopoken Term Detection Based on a Syllable N-gram Index at the NTCIR-11 SpokenQuery&Doc Task
Nagisa Sakamoto, Kazumasa Yamamoto and Seiichi Nakagawa
[Pdf] [Table of Content]

For spoken term detection, it is crucial to consider out-of-vocabulary (OOV) and the mis-recognition of spoken words. Therefore, various sub-word unit based recognition and retrieval methods have been proposed. We also proposed a distant n-gram indexing/retrieval method for spoken queries, which is based on a syllable n-gram and incorporates a distance metric in a syllable lattice. The distance represents confidence score of the syllable n-gram assumed the recognition error such as substitution error, insertion error and deletion error. To address spoken queries, we propose a combination of candidates obtained through some ASR systems which are based on syllable or word units. We run some experiments on the NTCIR-11 SpokenQuery&Doc Task and report the evaluation results.
STD Score Combination with Acoustic Likelihood and Robust SCR Models for False Positives: Experiments at NTCIR-11 SpokenQuery&Doc
Yusuke Takada, Sho Kawasaki, Hiroshi Oshima, Hiroshi Kawatani and Tomoyoshi Akiba
[Pdf] [Table of Content]

In this paper, we report our experiments at NTCIR-11 SpokenQuery&Doc task. We participated both the STD and SCR subtasks of SpokenDoc. For STD subtask, We try to improve detection accuracy by combining the DTW distance between syllable sequences and the acoustic likelihood of the detected speech segment. The final combined score, which is obtained by applying logistic regression on the, was used for rescoring the detection results. For SCR subtask, we propose robust retrieval models for false positive errors by using word co-occurrences. False positive errors is such a error that does not exist actually in a document but is considered accidentally. To deal with them, we introduce the word co-occurrence information into retrieval models.
Overview of NTCIR-11 Temporal Information Access (Temporalia) Task
Hideo Joho, Adam Jatowt, Roi Blanco, Hajime Naka and Shuhei Yamamoto
[Pdf] [Table of Content]

This paper describes the overview of NTCIR-11 Temporal Information Access (Temporalia) task. This pilot task aims to foster research in temporal aspects of information retrieval and search. Temporalia is composed of two subtasks: Temporal Query Intent Classification (TQIC) and Temporal Information Retrieval (TIR) subtask. TQIC attracted 6 teams which submitted a total of 17 runs, while 6 teams took part in TIR proposing 18 runs. In this paper we describe both subtasks, datasets, evaluation methods and results of meta analyses.
Using Machine Learning to Predict Temporal Orientation of Search Engines' Queries in the Temporalia Challenge
Michele Filannino and Goran Nenadic
[Pdf] [Table of Content]

We present our approach to the NTCIR-11 Temporalia challenge, Temporal Query Intent Classification: predicting the temporal orientation (present, past, future, atemporal) of search engine user queries. We tackled the task as a machine learning classification problem. Due to the relatively small size of the training set provided, we used temporal-oriented attributes specifically designed to minimise the features' sparsity. The best submitted run achieved 66.33% of accuracy, by correctly predicting the temporal orientation of 199 test instances out of 300. We also present the results of the manual error analysis performed on the predicted classes, which sheds light on the main sources of error. Finally, we present some a-posteriori improvements to the best submitted run, which lead to a 6% improvement in terms of accuracy (72.33%).
MPI-INF at the NTCIR-11 Temporal Query Classification Task
Robin Burghartz and Klaus Berberich
[Pdf] [Table of Content]

MPI-INF participated in the Temporal Query Intent Classification Task (TQIC) of the Temporalia track at NTCIR-11. This paper describes our approach to address this specific task. Our overall strategy has been to rely on established off-the-shelf components (e.g., standard classifiers from Weka and natural language processing methods from Stanford CoreNLP) and focus on feature engineering. Devised features include surface (e.g., n-grams), linguistic (e.g., capturing whether the query is a question), and temporal (e.g., statistics about publication dates and temporal expressions). We provide details on their precise definition and report on their effectiveness.
A Logistic Regression Approach for NTCIR-11 Temporalia
Ray Larson
[Pdf] [Table of Content]

Berkeley's approach to the Temporalia TIR retrieval task for NTCIR-11 has been, as is our custom with new tasks, to use our probabilistic text retrieval methods to establish an in-house baseline for future experiments. For our initial experiments we used only the Logistic Regression ranking both with and without pseudo relevance feedback. We have previously used these algorithms in the NTCIR-8 and NTCIR-9 GeoTime tasks, as well as in many other evaluations at CLEF and INEX. This brief paper describes the submitted runs and the methods used for them.
Andd7 @ NTCIR-11 Temporal Information Access Task
Abhishek Shah, Dharak Shah and Prasenjit Majumder
[Pdf] [Table of Content]

The Andd7 team of Dhirubhai Ambani Institute of Information and Communication Technology(DA-IICT) participated in both the subtasks namely Temporal Query Intent Classification(TQIC) and Temporal Information Retrieval(TIR) of the pilot task of NTCIR-11 Temporal Information Access(Temporalia) Task [1]}. This report describes different classification methods and feature sets used for classifying queries for TQIC and our approach towards building an Information Retrieval system for TIR subtask. Experimental results show that one of our system achieves the second best accuracy of all the systems submitted by different participants. Also for TIR task, we have achieved a comparative nDCG@20 which we have used for evaluation of our system.
TUTA1 at the NTCIR-11 Temporalia Task
Hai-Tao Yu, Xin Kang and Fuji Ren
[Pdf] [Table of Content]

This paper details our participation in the NTCIR-11 Temporalia task including Temporal Query Intent Classification (TQIC) and Temporal Information Retrieval (TIR). In the TQIC subtask, we explore the rich temporal information in the labeled and unlabeled search queries. Semi-supervised and supervised linear classifiers are learned to predict the temporal classes for each search query. In the TIR subtask, we perform temporal ranking based on the technique of learning-to-rank. Two classes of features are investigated for estimating the document relevance.
HITSZ-ICRC at NTCIR-11 Temporalia Task
Yongshuai Hou, Cong Tan, Jun Xu, Youcheng Pan, Qingcai Chen and Xiaolong Wang
[Pdf] [Table of Content]

Temporal Information Access (Temporalia) task is a pilot task at NTCIR-11 for the first year. HITSZ-ICRC group participated in Temporalia task, worked in both Temporal Query Intent Classification (TQIC) subtask and Temporal Information Retrieval (TIR) subtask. In TQIC subtask, firstly, we extracted different linguistic level features from user query, extracted expanding features for the query by downloading search results from search engine Bing; then we designed rule based method and multi-classifier voting method to classify user query intent separately; in formal run step, we combined the classification results produced by rule based method and multi-classifier voting method as final classification result to submit. In TIR subtask, firstly, we built an index for documents in the aim corpus using Lucene tool kit; secondly we calculated the content relevant score using BM25 model and the temporal relevant score based on the date distance between the query date and time expression tagged in document content; thirdly, we developed two rank methods, relevant score weighted sum and learning to rank, to calculate the final relevant score for each document and rank relevant documents based on the final score; the subtopic classification method we used in TIR subtask is same as in TOIC subtask.
OKSAT at NTCIR-11 Temporalia: Plural Sets of Search Terms for a Topic
Takashi Sato and Shingo Aoki
[Pdf] [Table of Content]

Our group submitted runs for Temporal Information Retrieval (TIR) subtask of NTCIR-11 Temporalia. For one of our runs, we prepared plural sets of search terms for a subtopic. Analyzing experimental results, we observe the effectiveness of using plural sets of search terms for a subtopic.
HULTECH at the NTCIR-11 Temporalia Task: Ensemble Learning for Temporal Query Intent Classification
Mohammed Hasanuzzaman, Gaël Dias and Stephane Ferrari
[Pdf] [Table of Content]

This paper describes the HULTECH system of the NTCIR-11 Temporal Query Intent Classification (TQIC) subtask. Given a query string, the task is to assign one of four temporal classes i.e. Past, Recency, Future or Atemporal. In particular, we experimented an ensemble learning paradigm, which underlying idea is to reduce bias by combining multiple classifiers instead of a single one. We considered 11 types of features from three different information sources (TempoWordNet, Web snippets results, the query itself seen as a sentence) and used a subset of them for our submitted runs. Our system reahes average results but outperforms other participants for the temporal class Recency in terms of F-measure. These intial results open interesting issues for future works.
Overview of the NTCIR-11 Cooking Recipe Search Task
Michiko Yasukawa, Fernando Diaz, Gregory Druck and Nobu Tsukada
[Pdf] [Table of Content]

This paper describes an overview of the NTCIR-11[1] Cooking Recipe Search pilot task (the rst RecipeSearch task). In this pilot task, we explore the information access tasks associated with cooking recipes. Our subtasks include adhoc recipe search and recipe pairing. We summarize the English/Japanese test collections and our task design to develop the collections, and then report official results of the evaluation experiments. In this task, a corpus of approximately 100,000 English recipes have been used for the English search. For the Japanese search, a corpus of approximately 440,000 Japanese recipes has been used. In the adhoc and recipe pairing subtasks, 500 and 100 queries have been developed, in English and Japanese, respectively. In the task, four research groups participated, and 31 search runs in total have been submitted.
OPU at NTCIR-11 RecipeSearch: Japanese Recipe Pairing by Naive Bayes Estimation with Names of Ingredients
Yasuhiro Tajima, Genichiro Kikui, Megumi Kubota and Rikako Inoue
[Pdf] [Table of Content]

We suggest a naive Bayes method for Japanese recipe pairing which uses ingredients of main and side dishes. For every pair of ingredients in the learning data, we calculate the probability of the co-occurrence. For a main dish of the evaluation, we guess a side dish whose posterior probability has the maximum value. In our experiment, the domain of evaluation data is restricted to the 100 examples which are given by TOs. When we evaluate a main dish, we calculate the posterior probability for every side dish in the 100 examples. Then the side dish whose posterior probability is the maximum is guessed.
OKSAT at NTCIR-11 RecipeSearch: Categorization and Expansion of Search Terms in Topics
Takashi Sato, Shingo Aoki and Yuta Morishita
[Pdf] [Table of Content]

Our group OKSAT submitted five runs for English and Japanese ad hoc recipe search (EN1 and JA1) subtasks of NTCIR-11 Cooking Recipe Search (RecipeSearch). For EN1, we tried to categorize search terms of topics. We also tried to expand search term for some runs we submitted. Analyzing experimental results, we observe the effectiveness of our method.
Hiroshima City University at NTCIR-11 Cooking Recipe Search Task
Hidetsugu Nanba
[Pdf] [Table of Content]

Our group participated in the subtask involving an ad hoc Japanese recipe search. Our goal was to evaluate the effectiveness of our Japanese cooking ontology for the recipe search. To investigate the effectiveness of our ontology-based approach, we conducted experiments and found that our method can improve upon traditional document retrieval systems.
Gunma University, Kiryu University, and RMIT University at the NTCIR-11 Cooking Recipe Search Task
Michiko Yasukawa, Hiroji Ishii and Falk Scholer
[Pdf] [Table of Content]

We report an empirical study of the NTCIR-11 Cooking Recipe Search task. A series of experiments was performed in both Japanese and English based on a collaboration that involved research groups from Gunma University, Kiryu University and RMIT University. We compared baseline, oracle, and test search runs in the task. We also report the findings that we obtained from studies of food synonyms and recipe similarity.
Overview of the NTCIR-11 QA-Lab Task
Hideyuki Shibuki, Kotaro Sakamoto, Yoshinobu Kano, Teruko Mitamura, Madoka Ishioroshi, Kelly Y. Itakura, Di Wang, Tatsunori Mori and Noriko Kando
[Pdf] [Table of Content]

This paper describes an overview of the first QA Lab (Question Answering Lab for Entrance Exam) task at NTCIR 11. The goal of the QA lab is to provide a module-based platform for advanced question answering systems and comparative evaluation for solving real-world university entrance exam questions. In this task, “world history” questions are selected from The National Center Test for University Admissions and from the secondary exams at 5 universities in Japan. This paper also describes the used data, baseline systems and formal run results.
Solving History Exam by Keyword Distribution: KJP System at NTCIR-11 QALab Task
Yoshinobu Kano
[Pdf] [Table of Content]

The QALab task requires to solve the history problems of the Center Exam. Although it seems like a factoid based question-answering problems, we suggest a simple, but fundamentally important, keyword based technique. Regardless of employed methods, the way how to handle the keyword distribution is the fundamental issue to solve the problems. Our system is domain-independent, language-independent, and unsupervised where no training is required. These features would allow direct applications of the system to other types of problems in the future.
Forst: Question Answering System Using Basic Element at NTCIR-11 QA-Lab Task
Kotaro Sakamoto, Hyogo Matsui, Eisuke Matsunaga, Takahisa Jin, Hideyuki Shibuki, Tatsunori Mori, Madoka Ishioroshi and Noriko Kando
[Pdf] [Table of Content]

This paper describes Forst's approach to university entrance examinations at NTCIR-11 QA-Lab Task. Our system consists of two types of modules: dedicated modules for each question format and common modules called by the dedicated modules as necessary. Our system uses Basic Element in order to more exactly grasp and reflect the import of questions. We also tackled short-essay questions in the secondary examinations.
FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples
Takuya Makino, Seiji Okura, Seiji Okajima, Shuangyong Song and Hiroko Suzuki
[Pdf] [Table of Content]

This paper describes FLL approach for National Center Test for University Admissions as a Center Exam subtask. Our system uses search results obtained with different search engines as clues for answering the exam subtask. In order to answer questions, a combination of rule based approach and a machine learning approach is used.
CMU Multiple-choice Question Answering System at NTCIR-11 QA-Lab
Di Wang, Leonid Boytsov, Jun Araki, Alkesh Patel, Jeff Gee, Zhengzhong Liu, Eric Nyberg and Teruko Mitamura
[Pdf] [Table of Content]

This paper describes CMU’s UIMA-based modular question answering (QA) pipeline that automatically answers multiple-choice questions for the entrance exams about world history in English. Given a topic, contextual information (a short excerpt on the topic), and specific question instructions, we generate verifiable assertions for each answer choice. The most plausible answer choice is selected based on aggregating collected evidence scores. In the NTCIR-11 QALab evaluations, our system achieved 51.6% accuracy on training set, 47.2% on Phase 1 testing set, and 34.1 % on Phase 2 testing set.
Using Time Periods Comparison for Eliminating Chronological Discrepancies between Question and Answer Candidates at QALab NTCIR11 Task
Yasutomo Kimura, Fumitoshi Ashihara, Arnaud Jordan, Keiichi Takamaru, Yuzu Uchida, Hokuto Ototake, Hideyuki Shibuki, Michal Ptaszynski, Rafal Rzepka, Fumito Masui and Kenji Araki
[Pdf] [Table of Content]

This paper reports on our approach to the NTCIR-11 QALab task of answering questions from Japanese National Center Examinations for Universities. Our approach aims at identifying and comparing periods of world history in both questions and the answer candidates. We created and applied a date identification method, which checks for temporal overlaps between time periods in questions and their answer candidates. In this paper we introduce details of this method and analyze the test results. When tested on the World History Dictionary that is used for preparing to the exams, our approach achieved 30% of correct answers in the 2007 Center Exam Task and 17% for the 2003 Center Exam Task.
NUL System at QALab Tasks
Hiroshi Miyashita, Ai Ishii, Mio Kobayashi and Chikara Hoshino
[Pdf] [Table of Content]

This paper describes the submitted strategy and the meth- ods of NUL team on NTCIR-11 QAlab Center examination tasks. Our purpose of joining this task is to evaluate the en- tailment recognition systems which we made for RITE-VAL tasks. Our strategy is very primitive which directly convert the question to the entailment problem by simply matching the type of question answer pairs. Then, we solve the en- tailment problem and covert the result to the task answer backwardly.
sJanta: An Open Domain Question Answering System
Md. Arafat Rahman and Md-Mizanur Rahoman
[Pdf] [Table of Content]

This paper reports on the participation of the system namely ‘sJanta’ at NTCIR-11 QA Lab English Sub-task. sJanta is a modular question answering system that can answer multiple choice questions given in English natural language. We use English Wikipedia as knowledge source. At first, we did wikification on the question and the context of the question. Then, with wikified output, we extracted Wikipedia articles. After finding the articles, we retrieved specific passages which talk about the question. Then we did dependency parsing, context analysis and semantic similarity matching to score the answer choice. Finally, the best score answer choice was picked as the answer.
A Feature-based Classification Technique for Answering Multi-choice World History Questions: FRDC_QA at NTCIR-11 QA-Lab Task
Shuangyong Song, Yao Meng, Zhongguang Zheng and Jun Sun
[Pdf] [Table of Content]

Our FRDC_QA team participated in the QA-Lab English subtask of the NTCIR-11. In this paper, we describe our system for solving real-world university entrance exam questions, which are related to world history. Wikipedia is used as the main external resource for our system. Since problems with choosing right/wrong sentence from multiple sentence choices account for about two-thirds of the total, we individually design a classification based model for solving this type of questions. For other types of questions, we also design some simple methods.
The Question Answering System of DCUMT in NTCIR-11 QALab
Tsuyoshi Okita and Qun Liu
[Pdf] [Table of Content]

N/A