Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, June 7-10, 2016 Tokyo Japan

Abstracts

[Overview]

Overview of NTCIR-12
Kazuaki Kishida and Makoto P. Kato
[Pdf] [Table of Content]

This is an overview of NTCIR-12, the twelfth sesquiannual research project for evaluating information access technologies. NTCIR-12 presents a diverse set of evaluation tasks related to information retrieval, question answering, natural language processing, etc (in total, 9 tasks are set up at NTCIR-12). This paper describes an outline of the research project, which includes its organization, schedule, scope and task designs. In addition, brief statistics on participants in NTCIR-12 Conference is given. Readers should refer to individual task overview papers for their activities and ﬁndings.

[Keynote]
What Would We Like IR Metrics to Measure?
Alistair Moffat
[Pdf] [Table of Content]

The field of Information Retrieval has a long-standing tradition of rigorous evaluation, and an expectation that proposals for new mechanisms and techniques will either be evaluated in batch-mode experiments against realistic test collections, with results reported derived from standard tools; or will be evaluated through the use of user studies. This emphasis on evidence, and the desire for verification of proposals, has meant that IR effectiveness measurement is an important area studied in its own right. The result has been the development of a complex suite of relevance metrics, each of them with seemingly different behavior. Well-known examples include Precision, Recall, Average Precision, Normalized Discounted Cumulative Gain [3], BPref [2], the Q-Measure [6], Rank-Biased Precision (RBP) [4], and so on. In this presentation the underlying question of what it is that a metric should measure is returned to, with a set of desiderata for usefulness used as a starting point for examining the existing palette of metrics. Recent work that has described a goal-sensitive adaptive metric called INST [1, 5] will then be presented.

Core Tasks

[IMine-2]
Overview of the NTCIR-12 IMine-2 Task
Takehiro Yamamoto, Yiqun Liu, Min Zhang, Zhicheng Dou, Ke Zhou, Ilya Markov, Makoto P. Kato, Hiroaki Ohshima and Sumio Fujita
[Pdf] [Table of Content]

In this paper, we provide an overview of the NTCIR-12 IMine-2 task, which is a core task of NTCIR-12 and also a succeeding work of IMine@NTCIR-11, INTENT-2@NTCIR-10, and INTENT@NTCIR-9 tasks. IMine-2 comprises the Query Understanding subtask and the Vertical Incorporating subtask. 23 groups from diverse countries including China, France, India, Portugal, Ireland, and Japan registered to the task. Finally, IMine-2 attracted 9 participating teams; we received 42 runs for the Query Understanding subtask and 15 runs for the Vertical Incorporating subtask. We describe the subtasks, data, evaluation methods, and report the official results for each subtask.
NEXTI at NTCIR-12 IMine-2 Task
Hidetsugu Nanba, Tetsuya Sakai, Noriko Kando, Atsushi Keyaki, Koji Eguchi, Kenji Hatano, Toshiyuki Shimizu, Yu Hirate and Atsushi Fujii
[Pdf] [Table of Content]

Our group NEXTI participated in the Query Understanding subtask for Japanese. We extracted subtopic candidates from retrieved web documents. Then, we merged them with query suggestion and query log data. We also identified vertical intents of each subtopic using a method, combining machine learning-based and k-NN based methods. We conducted experiments and confirmed the effectiveness of our method.
HUKB at NTCIR-12 IMine-2 Task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining
Masaharu Yoshioka
[Pdf] [Table of Content]

Query understandings is a task to identify the important subtopics of a given query with vertical intent. In this task, characteristic keywords extracted from query analysis results and Wikipedia are used as candidates for the subtopics. From these candidates, topic-model based on the web documents retrieved by an original query is used for selecting appropriate subtopics, Vertical intent is judged mainly by the typical keyword list used for the particular vertical intent. For the Image, News and Shopping, the system checks type of retrieved documents that are estimated by using ALT value of of IMG tag, anchor text and site list for URLs for vertical intent estimation.
RUCIR at NTCIR-12 IMINE-2 Task
Ming Yue, Zhicheng Dou, Sha Hu, Jinxiu Li, Xiaojie Wang and Ji-Rong Wen
[Pdf] [Table of Content]

In this paper, we present our participation in the Query Understanding subtask and Vertical Incorporating subtask of the NTCIR-12 IMine2 task, for both English and Chinese topics. In the Query Understanding subtask, we combine the extracted candidates from search engine suggestions and Wikipeida, and classify their verticals after clustering and ranking them. In the Vertical Incorporating subtask, we provide a general method to adapt traditional diversity algorithms to deal with predefined subtopics with classified verticals in diversification.
Search Intent Mining by Word Vectors Clustering at NTCIR-IMine
Jose G. Moreno and Gaël Dias
[Pdf] [Table of Content]

This paper presents a method for intent mining based on semantic vectors and search results clustering. Our algorithm represent words as documents and performs a state-of-the-art approach for query log driven clustering. Similarities between query logs and words are calculated by the use semantic vectors.
THUIR at NTCIR-12 IMine Task
Zeyang Liu, Ye Chen, Rongjie Cai, Jiaxin Mao, Chao Wang, Cheng Luo, Xin Li, Yiqun Liu, Min Zhang, Huanbo Luan and Shaoping Ma
[Pdf] [Table of Content]

In this paper, we describes our approaches in the NTCIR-12 IMine task, including Chinese Query Understanding and Chinese Vertical Incorporating. In Query Understanding subtask, we propose different strategies to mine subtopic candidates from a wide range of resources and present a two-step method to predict the vertical intent for each subtopic. In Vertical Incorporating subtask, we adopt a probabilistic algorithm to rerank the result lists of Web documents and incorporate virtual verticals into the result lists based on the intent of subtopic behind the query.
IMC at the NTCIR-12 IMine-2 Query Understanding Subtask
Jiahui Gu, Chong Feng and Yashen Wang
[Pdf] [Table of Content]

This paper describes the participation of IMC team in the Chinese Query Understanding Subtask in the NTCIR-12 IMine-2 Task. To identify the subtopics of a given query, we utilize several data resource and innovatively employ new words extraction theory to obtain the expansion terms for a query, which is the kernel of the proposed approach.Then we generate the query subtopic based on the expansion terms obtained above. Moreover, we also attempt to leverage topic model in another way of subtopic terms generation, and use K-means algorithm for diversity clustering of query subtopics.
IRCE at the NTCIR-12 IMine-2 Task
Ximei Song, Yuka Egusa, Hitomi Saito and Masao Takaku
[Pdf] [Table of Content]

The IRCE team participated in the IMine-2 task at the NTCIR-12 workshop. We submitted one Chinese language run and five Japanese language runs for the Query Understanding subtask. Our methods exploited online text corpora BaiduPedia for the Chinese language run and Japanese Wikipedia for the Japanese language runs. The approaches employed in the Chinese and Japanese language topics are differed. This paper discusses our approaches to the Query Understanding subtask of the NTCIR-12 IMine-2 task.
KDEIM at NTCIR-12 IMine-2 Search Intent Mining Task: Query Understanding through Diversified Ranking of Subtopics
Md Zia Ullah, Md Shajalal and Masaki Aono
[Pdf] [Table of Content]

In this paper, we describe our participation in the Query Understanding subtask of the NTCIR-12 IMINE Task. We propose a method that extracts subtopics by leveraging the query suggestions from search engines. The importance of the subtopics with the query is estimated by exploiting multiple query-dependent and query-independent features with supervised feature selection. To diversify the subtopics, we employ maximum marginal relevance (MMR) framework based diversification technique by balancing the relevancy and novelty. The best performance of our method achieves an I-rec of 0.7557, a D-nDCG of 0.6644, a D#-nDCG of 0.7100, and a QU-score of 0.5057 at the cut off rank 10 for query understanding task.
YJST at the NTCIR-12 IMine-2 Task
Yasuaki Yoshida, Hideyuki Maeda, Tatsuhiro Niwa and Sumio Fujita
[Pdf] [Table of Content]

Yahoo Japan Search Technology (YJST) team participated in Query Understanding subtask of the NTCIR-12 IMine-2 Task. In this subtask, we mine query logs and their clickthrough logs. In Vertical Identification subtask, we have made models of each of the verticals using query logs.

Return to Top

[MedNLPDoc]
Overview of the NTCIR-12 MedNLPDoc Task
Eiji Aramaki, Mizuki Morita, Yoshinobu Kano and Tomoko Ohkuma
[Pdf] [Table of Content]

Due to the recent replacements of physical documents with electronic medical records (EMR), the importance of information processing in medical fields has been increased. We have been organizing the MedNLP task series in NTCIR-10 and 11. These workshops were the first shared tasks which attempt to evaluate technologies that retrieve important information from medical reports written in Japanese. In this report, we describe the NTCIR-12 MedNLPDoc task which is designed for more advanced and practical use for the medical fields. This task is considered as a multi-labeling task to a patient record. This report presents results of the shared task, discusses and illustrates remained issues in the medical natural language processing field.
Team-Nikon at NTCIR-12 MedNLPDoc Task
Hiroko Kobayashi, Toyoharu Sasaki and Toru Fujii
[Pdf] [Table of Content]

The phenotyping task of the NTCIR-12 MedNLPDoc Task is a multi-labeling task retrieved from Japanese medical records. The team-Nikon participated in this task and proposed a new method that assigns the ICD codes by using Information Retrieval (IR) and reduces the magnitude of mistaken coding by using machine learning. When evaluated on development set, our system achieved F-scores of 29.2% and showed an effect of less mistaken coding compared to the IR method. On the other hand, in the test set, the effect of IR method is higher than the combined method of IR and machine learning.
Inference of ICD Codes by Rule-Based Method from Medical Record in NTCIR-12 MedNLPDoc
Masahito Sakishita and Yoshinobu Kano
[Pdf] [Table of Content]

We propose an effective method which automatically assigns proper ICD codes for diagnosis. Unfortunately, the number of electronic Japanese medical records available would not be sufficient for statistical machine learning methods to perform well. Therefore, we observed characteristics of medical records manually, writing rules to make effective methods by hand. Our system achieved the highest F-measure score among all participants in the most severe evaluation criteria. Through comparison with other approaches, we show that our approach could be a useful milestone for the future development of Japanese medical records processing.
NIL: Simple Approach to Find the ICD Codes
Masao Ito and Masaya Kato
[Pdf] [Table of Content]

To do the task of NTCIR-12 MedNLPDoc, we adopted the simple approach that is match the noun in the medical records with the disease name in the ICD code after morphological analysis of text of medical records. Of course, the effectiveness of this approach is restrictive. But this is our first try and we consolidate the problems and advance toward the next step.
NARS: NTCIR-12 MedNLPDoc Baseline
Eiji Aramaki and Shoko Wakamiya
[Pdf] [Table of Content]

NTCIR-12 MedNLPDoc is a shared task of ICD coding task, which is a multi-labeling task to a patient medical record. This paper describes the baseline system of the task. The system is based on the simple word match with a disease name dictionary without any use of training data. This report presents the results of the baseline system, and discusses the basic feasibility of this system.
Similarity Matrix Model for the NTCIR-12 MedNLPDoc Task
Yuichiro Sawai, Mai Omura, Hiroki Ouchi, Yuki Nagai, Masashi Yoshikawa and Ikuya Yamada
[Pdf] [Table of Content]

We participated in the NTCIR-12 MedNLPDoc phenotyping task. In this paper, we describe our approach for this task. The core part of our model is a similarity matrix model in which each element has a local similarity between n-grams from a disease name and a medical record. We conduct an experiment to evaluate the effectiveness of our method. We report the results of our preliminary experiments and also the final task results.
HCU at the NTCIR-12 MedNLPDoc Task
Ayumu Hiramae, Hidetsugu Nanba and Toshiyuki Takezawa
[Pdf] [Table of Content]

Our team participated in the Phenotyping Task. We tackled the task by using machine learning technique and edit distance by DP matching. This paper outlines the methods we used to obtain the result evaluated by task organizer.
UE-UD at NTCIR-12 MedNLPDoc Task
Paulo Quaresma and Nga Tran Anh Hang
[Pdf] [Table of Content]

Technology is the tool that is being used in the various sectors of life and medical is one of them. Electronic medical records (EMR) are now widely used instead of physical documents. This paper aims to achieve continuing challenges of MedNLP task series in NTCIR-10 and 11. In these task, it already attempted named entity recognition (NER) and evaluated the term normalization technology from medical reports written in Japanese, whereas, this task are more advantage, practical and more closer to reality application for the medical industry. This task divided into 2 subtasks: (Task1) Phenotyping task requires to give a standard disease names from given medical records, (Task2) creative task to make up ideas to utilize resulting products in real world. This paper focus on using tag of speech and improve NER to correctly get sequences of words string in order to achieve the ICD. The experimental result has not shown quite high performance (precision major: 9.6%, recall major: 4.4%, F-measure major: 6.0%) but it is strongly promising since this is the result from an international group that does not speak very much Japanese.
Report on the NTCIR-12 MedNLPDoc Task Results
Dina Vishnyakova, Christophe Gaudet-Blavigniac, Selen Bozkurt, David-Zacharie Issom, Renat Vishnyakov and Christian Lovis
[Pdf] [Table of Content]

Since the reuse of clinical data for the research environment is becoming one of the important tasks in medical informatics. The automatic assignment of the medical codes to the identified textual concepts is turning to the Sisyphean task. For the MedNLP task in NTCIR-12 a new approach to automatically enrich a dictionary using online data is proposed. We have developed the text-mining system able to treat medical textual data represented in Japanese language and to assign ICD-10 codes with English descriptors to the identified concepts. There are 3 main parts in the functionality of the system: 1) English version of ICD-10-based dictionary, 2) Wikipedia-based synonyms 3) statistical translation tools such Yandex and Google Translate APIs. This report presents the description of the system and the achieved results on the MedNLPDoc test data. Additionally we provide an ICD assignation frequency in University Hospitals of Geneva.

Return to Top

[MobileClick-2]
Overview of the NTCIR-12 MobileClick-2 Task
Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Virgil Pavlu, Hajime Morita and Sumio Fujita
[Pdf] [Table of Content]

This is an overview of the NTCIR-12 MobileClick-2 task (a sequel@to 1CLICK in NTCIR-9 and NTCIR-10). In the MobileClick@task, systems are expected to output a concise summary of information@relevant to a given query and to provide immediate and direct@information access for mobile users. We designed two types of@MobileClick subtasks, namely, iUnit ranking and summarization@subtasks, in which twelve research teams participated and submitted@66 runs. We describe the subtasks, test collection, and evaluation methods and then report official results for NTCIR-12 MobileClick.
NCU IISR System for NTCIR-12 MobileClick2
Wen-Bin Han, Hung-Hsiang Wang and Richard Tzong-Han Tsai
[Pdf] [Table of Content]

This paper describes our approach to the NTCIR-12 MobileClick task. First of all, we do some extra process on the baseline. Next, we try to use a totally different method from baseline which is machine learning. Finally, tune the two types into better situation and apply them to test data. Our system achieves an nDCG@3 score of 0.7415, nDCG@5 score of 0.764, nDCG@10 score of 0.8059, nDCG@20 score of 0.8732 and a Q-measure score of 0.9004, outperforming the baseline a little bit.
UHYG at the NTCIR-12 MobileClick Task: Link-based Ranking on iUnit-Page Bipartite Graph
Sho Iizuka, Takayuki Yumoto, Manabu Nii and Naotake Kamiura
[Pdf] [Table of Content]

We participated in the iUnit ranking subtask and the iUnit summarization subtask of the NTCIR-12 MobileClick for the Japanese and English languages. Our strategy is based on link analysis on an iUnit-page bipartite graph. First, we constructed an iUnit-page bipartite graph considering the entailment relationship between the iUnits and the pages. Then, we ranked the iUnits by their scores based on link analysis. For the iUnit ranking subtask, we examined three types of entailment relationships and three types of link analysis, the degree of nodes, PageRank, and HITS. For the iUnit summarization subtask, we propose an intent-sensitive PageRank that is an extended version of the topic-sensitive PageRank based on the probability that users visit pages in a search result page.
Element-based Retrieval@MobileClick-2
Atsushi Keyaki, Jun Miyazaki and Kenji Hatano
[Pdf] [Table of Content]

In this paper, we report our effort of tackling MobileClick-2 with the element-baesd retrieval approach. The goal of element-based retrieval is to identify only relevant descriptions to a query and show them to a user. We believe this is essientially similar to that of MobileClick-2, wchich is why we employed element-based retrieval approach to this task. Because the output unit of element-based retrieval is element whereas that of MobileClick-2 is iUnit, we need more effort on identifying iUnits corresponding to the relevant elements.
NUTKS at NTCIR-12 MobileClick2: iUnit Ranking Subtask Using Topic Model
Tatsunori Yoshioka and Takashi Yukawa
[Pdf] [Table of Content]

In this paper, NUTKS (Nagaoka University of Technology, Knowledge Systems Laboratory) report the results of our participated in the NTCIR-12 MobileClick task iUnit Ranking subtask. The authors have ranked to iUnit using similarity of iUnit word distribution and LDA based topic. Our system has recorded Q-measure score is 0.7392 and nDCG@20 score is 0.6334. In baseline system it has recorded Q-measure score is 0.7411. Therefore, there is no difference in the baseline system and our approach. From these results, it considers search intent as topic. However, it cannot output result of reflecting search intent diversity only using topic similarity.
CUIS at the NTCIR-12 MobileClick2 Task
Kwun Ping Lai, Wai Lam and Lidong Bing
[Pdf] [Table of Content]

We present our approach for solving the iUnit ranking and iUnit summarization subtasks of MobileClick2. We first conduct intent discovery based on latent topic modeling. Our iUnit ranking method exploits the discovered intents and considers the importance of an iUnit in each Web content document. We further develop our iUnit summarization model using the outcome from iUnit ranking subtask. Our result submitted to the iUnit ranking subtask outperforms the best baseline method of organizers.
NTCIR-12 MOBILECLICK: Sense-based Ranking and Summarization of English Queries
Monalisa Dey, Anupam Mondal and Dipankar Das
[Pdf] [Table of Content]

NTCIR-12 MobileClick task has been designed to rank and summarize English queries. The primary aim of this task was to develop a system which is capable of minimizing interaction between the human users and mobile phones while extracting relevant data with respect to given queries. Organizers provided the data represented as information units (iUnits). Each of the iUnits describes a pertinent query associated with other information like type or category, relevance, sense and knowledge-based relations [1] [2] [4]. The task is divided into two sub-tasks namely ranking and summarization. The ranking sub-task focuses on identifying the important iUnits related to a query. In the summarization sub-task, the output has to be designed as a two-layered model where the first layer will identify the important iUnits and the second layer will compile those important iUnits and generate a summarized output for the query. In this present task, we have employed several sentiment lexicons like SentiWordNet , SenticNet etc. with tabulation based approaches to identify the important query-based iUnits for ranking and summarization. Our sense-based system has achieved a score of 0.8859 mean Q-measure for ranking and score of 11.7033 mean M-measure for summarization tasks, respectively.
IRIT at the NTCIR-12 MobileClick-2 Task
Abdelhamid Chellal and Mohand Boughanem
[Pdf] [Table of Content]

This paper presents the participation of IRIT laboratory (University of Toulouse) to MobileClick-2 Task of NTCIR-12. This task aims to provide immediate and direct information that can be accessed by users' mobiles. For a given query, summarization systems are expected to provide two-layered summary of relevant information units (iUnits). Two subtasks have been defined named as iUnit ranking and iUnit Summarization. In iUnit ranking subtask, we propose to rank iUnits according to their amount of information which are evaluated using the entropy of Shannon. For iUnit summarization subtask, we propose two different strategies to build a summary. The first one is a top-down approach where the first layer is filled first, while the second strategy is bottom-up approach in which we start by filling the second layer. To estimate the similarity between words, we investigated the use of word2vec tool. For all these approaches, we discuss the obtained results during the experimental evaluation.
RMIT at the NTCIR-12 MobileClick-2: iUnit Ranking and Summarization Subtasks
Kevin Ong, Ruey-Cheng Chen and Falk Scholer
[Pdf] [Table of Content]

In the NTCIR-12 MobileClick-2 challenge, the RMIT Information Storage and Retrieval (ISAR) group participated in both the English iUnit Ranking and English iUnit Summarization tasks. This paper describes how we applied a learning-to-rank approach to the problem of iUnit Ranking and how the outcome of ranking was adapted to produce the summaries.
YJST at the NTCIR-12 MobileClick-2 Task
Yuya Ozawa, Taichi Yatsuka and Sumio Fujita
[Pdf] [Table of Content]

Yahoo Japan Search Technology(YJST) group participated in Japanese Ranking and Summarization subtask of MobileClick-2. For the iUnit Ranking task, we adopted LM-based approach, which is implemented on the basis of organizers' baseline system. We examined language model based iUnit ranking using both KL-divergence and negative cross entropy with several model smoothing methods such as Bayesian smoothing with Dirichlet priors which commonly used in the document ranking in language modeling IR, or comparatively new Pitman-Yor process smoothing. Our system achieved 0.807 as Q-measure against the Japanese ranking test set. For the iUnit Summarization task, we used the organizers' LM-based two-layer iUnit summarization baseline system but the ranking module is replaced by aforementioned our extended system. Due to word based matching, the baseline intent identification for the second layer allocation fails to identify any intent when no common word is found between iUnit and Intent. We introduced context based word embedding representation of both iUnit and Intent to identify the intent of iUnits which do not contain any explicit intent words. Finally Our system achieved 25.8498 in M-measure against the Japanese summarization test set.
Ranking and Summarization Using Word-embedding at NTCIR-12 MobileClick Task
Shinsuke Yokoyama, Sho Nakamura, Risa Kitajima and Yu Hirate
[Pdf] [Table of Content]

Our teamfs approach is based on word-embedding. We converted queries and iUnits to vectors with word2vec. On top of them, ranking-generation and summarization methods are applied to them.
University of Alicante at the NTCIR-12: Mobile Click
Fernando Llopis, Elena Lloret and Jose M. Gomez
[Pdf] [Table of Content]

This paper describes the rst participation of processing nat-ural language group of the University of Alicante in MobileClick Task of NTCIR 12. Our approach is based on the com-bination of tools developed in our research group: IR-n, apassage retrieval system; COMPENDIUM, a summarizationgenerator; and a new approach based on Principal Compo-nent Analysis, another type of summarizer.In our rst participation we focused on the iUnit RankingSubtask, although we have made an attempt on the iUnitSummarization Subtask

Return to Top

[SpokenQuery&Doc-2]
Overview of the NTCIR-12 SpokenQuery&Doc-2 Task
Tomoyosi Akiba, Hiromitsu Nishizaki, Hiroaki Nanjo and Gareth J. F. Jones
[Pdf] [Table of Content]

This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery&Doc-2) task at the NTCIR-12 Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) and a spoken query driven spoken term detection (SQ-STD) as the two sub-tasks. The paper describes details of each sub-task, the data used, the creation of the speech recognition systems used to create the transcripts, the design of the retrieval test collections, the metrics used to evaluate the sub-tasks and a summary of the results of submissions by the task participants.
DCU at the NTCIR-12 SpokenQuery&Doc-2 Task
David N. Racca and Gareth J. F. Jones
[Pdf] [Table of Content]

We describe DCUfs participation in the NTCIR-12 SpokenQuery&Doc (SQD-2) task. In the context of the slide-group retrieval sub-task, we experiment with a passage retrieval method that re-scores each passage according to the relevance score of the document from which it is taken. This is performed by linearly interpolating their relevance scores which are calculated using the Okapi BM25 model of probabilistic retrieval for passages and documents independently. In conjunction with this, we assess the benefits of using pseudo-relevance feedback for expanding the spoken queries with terms found in the top-ranked spoken documents and passages, and experiment with a general multidimensional optimisation method to jointly tune the BM25 and query expansion parameters with queries and relevance data from the NTCIR-11 SQD-1 task. Retrieval experiments performed over the SQD-1 and SQD-2 queries confirm previous findings which affirm that integrating document information when ranking passages can lead to improved passage retrieval effectiveness. Furthermore, results indicate that no significant gains in retrieval effectiveness can be obtained by using query expansion in combination with our retrieval models over these two query sets.
Spoken Document Retrieval Using Neighboring Documents and Extended Language Models for Query Likelihood Model
Kazuaki Ogawa, Tatsuaki Murahashi, Hiroaki Taguchi, Koudai Nakajima, Masanori Takehara, Satoshi Tamura and Satoru Hayamizu
[Pdf] [Table of Content]

This paper proposes several approaches for NTCIR-12 SpokenQuery&Doc-2. Our methods are based on the query model which is one of the probabilisrtic language models. We use Dirichlet smoothing to the query model. First, this paper proposes a smoothing method using related research papers. Second, this paper proposes a smoothing method using the linear interpolation of the cache model and the n-gram model applied Kneser-Ney smoothing. Finally, this paper proposes a smoothing method using neighbering documents. Experiments were conducted to evaluate these methods using NTCIR-12 test sets.
UB at the NTCIR-12 SpokenQuery&Doc-2: Spoken Content Retrieval Using Multiple ASR Hypotheses and Syllables
Jianqiang Wang
[Pdf] [Table of Content]

The University at Buffalo (UB) team participated in the SpokenQuery&Doc task at the NTCIR-12, working on the Spoken Content Retrieval (SCR) subtask. We investigated the use of multiple ASR hypotheses (words) and subword units (syllables) for improving retrieval effectiveness. We also compared the retrieval effectiveness based on texts generated by two automatic speech recognition (ASR) engines, namely Julius and KALDI. Our experiment results showed that using multiple ASR hypotheses did not improve retrieval effectiveness, while using ASR syllables alone led to lower mean average precision than using ASR words. Furthermore, ASR texts generated by the KALDI system resulted in significantly better retrieval effectiveness than those by the Julius system. Future areas of work are discussed.
Graph-based Document Expansion and Robust SCR Models for False Positives: Experiments at the NTCIR-12 SpokenQuery&Doc-2
Sho Kawasaki, Hiroshi Oshima and Tomoyosi Akiba
[Pdf] [Table of Content]

In this paper, we report our experiments at NTCIR-12 Spoken Query&Doc-2 task. We participated spoken query driven spoken content retrieval (SQ-SCR) subtasks of Spoken Query&Doc-2. We submited two types of results, which are conventional spoken content retrieval method (referred to as C-SCR) and STD based approach for SCR (referred to as STD-SCR). The latter was proposed in order to deal with speech recognition errors and out-of-vocabulary (OOV) words. We extend each SCR methods by several ways. For C-SCR, we applied graph-based document expansion method. For STD-SCR, we applied robust retrieval models for false positive errors by using word co-occurrences information.
An STD System Using Multiple STD Results and Multiple Rescoring Method for NTCIR-12 SpokenQuery&Doc Task
Ryota Konno, Kazuki Ouchi, Masato Obara, Yoshino Shimizu, Takashi Chiba, Tatsuro Hirota and Yoshiaki Itoh
[Pdf] [Table of Content]

Researches of Spoken Term Detection (STD) have been actively conducted in recent years. The task of STD is searching for a particular speech segment from a large amount of multimedia data that include audio or speech data. In NTCIR-12, a task containing multiple spoken queries is newly added to the STD task. In this paper, we explain an STD system that our team developed for the NTCIR-12 SpokenQuery & Doc task. We have already proposed the various methods to improve the STD accuracy for out-of-vocabulary (OOV) query terms. Our method consists of four steps. First, multiple automatic speech recognizers (ASRs) are performed for spoken documents using triphone, syllables, demiphone and SPS and multiple speech recognition results are obtained. Retrieval results are obtained for each subword unit. Second, these retrieval results are integrated [1][2]. Third, we apply a rescoring method to improve the STD accuracy that contains highly ranked candidates [3]. Lastly, a rescoring method is applied to compare a query with spoken documents in more detail by using the posterior probability obtained from Deep Neural Network (DNN) [4]. We apply this method to only the top candidates to reduce the retrieval time [5]. For a spoken query, we use two rescoring methods. First method compares two posterior probability vectors of the spoken query and spoken documents. Second method utilizes the papers in proceedings. We apply these methods to the test collection of NTCIR-12 and show experimental results for these methods.
Combining State-level and DNN-based Acoustic Matches for Efficient Spoken Term Detection in NTCIR-12 SpokenQuery&Doc-2 Task
Shuji Oishi, Tatsuya Matsuba, Mitsuaki Makino and Atsuhiko Kai
[Pdf] [Table of Content]

Recently, in spoken document retrieval task such as spoken term detectin (STD), there has been increasing interest in using a spoken query. In STD systems, automatic speech recognition (ASR) frontend is often employed for its reasonable accuracy and efficiency. However, out-of-vocabulary (OOV) problem at ASR stage has a great impact on the STD performance for spoken query. In this paper, we propose two spoken term detection methods which combine different types of side information for calculating integrated scores. Firstly, we propose combining feature-based acoustic match which is often employed in the STD systems for low resource languages, along with ASR-derived features. Secondly, we propose the method combining confidence measure of speech recognition with ASR. Both proposed methods consist of two-pass strategy. As the first pass, automatic transcripts for spoken document and spoken query are decomposed into corresponding acoustic model state sequences and used for spotting plausible speech segments. The experimental results showed that combination with feature-based acoustic match improves the STD performance compared to baseline system which uses the subword-level spotting alone.
Evaluation of DNN-based Phoneme Estimation Approach on the NTCIR-12 SpokenQuery&Doc-2 SQ-STD Subtask
Naoki Sawada and Hiromitsu Nishizaki
[Pdf] [Table of Content]

his paper proposes a correct phoneme sequence estimation method using a deep neural network (DNN)-based framework for spoken term detection (STD). We use a DNN as a correct phoneme estimator. The DNN-based estimator estimates a correct phoneme sequence of an utterance from some sorts of phoneme-based transcriptions produced by multiple ASR systems in post-processing, for reducing phoneme errors. In the experimental evaluation on the NTCIR-12 SpokenQuery&Doc-2 STD test collection, our proposed approach won the baseline system prepared by the task organizers. However, our approch could not defeat our DTW-based STD method we previously proposed.

Return to Top

[Temporalia-2]
Overview of NTCIR-12 Temporal Information Access (Temporalia-2) Task
Hideo Joho, Adam Jatowt, Roi Blanco, Haitao Yu and Shuhei Yamamoto
[Pdf] [Table of Content]

This paper overviews NTCIR-12 Temporal Information Access (Temporalia-2) task. The task aims to foster research in temporal aspects of information retrieval and search, and is a continuation of Temporalia-1 task at NTCIR-11. Temporalia-2 is composed of two subtasks: Temporal Intent Disambiguation (TID) and Temporally Diversified Retrieval (TDR). Both the subtasks have English and Chinese language versions. A total of 47 runs were submitted by 15 teams across the world. This was 40\% improvement in the number of participated teams when compared to the previous edition. TID in English attracted 12 teams which submitted a total of 30 runs, while its Chinese version attracted 3 teams submitting 7 runs. 4 teams including the organizer's team took part in TDR English language subtask with the total of 10 runs. In this paper we describe both the subtasks, datasets, evaluation methods and the results of meta analyses.
A Probabilistic Framework for Time-Sensitive Search: MPII at the NTCIR-12 Temporalia-2 Task
Dhruv Gupta and Klaus Berberich
[Pdf] [Table of Content]

This research article presents TimeSearch, a probabilistic framework, that competed in the Temporalia-2 task. The subtasks in Temporalia-2 require an information retrieval system to be informed of the temporal expressions (e.g. 1990s) in documents and queries to identify elevant documents. Analysis of these temporal expressions like natural language understanding is challenging. TimeSearch utilizes an unique time del to address these challenges and to understand temporal expressions. Building on this model it identifies interesting time intervals for a given keyword query. These time intervals are then used to rank and diversify documents in a time-sensitive manner. In this article we describe TimeSearch and its performance in Temporalia-2.
L3S at the NTCIR-12 Temporal Information Access (Temporalia-2) Task
Zeon Trevor Fernando, Jaspreet Singh and Avishek Anand
[Pdf] [Table of Content]

This paper describes our participation in the NTCIR-12 Temporalia-2 task including Temporal Intent Disambiguation (TID) and Temporally Diversified Retrieval (TDR) subtasks. In the TID subtask, we extract linguistic features from the query, time distance features and multinomial distribution of the query n-grams which are then combined using a rule based voting method to estimate a probability distribution over the temporal intents. In the TDR subtask, we perform temporal ranking based on two approaches, linear combination of textual and temporal relevance method, and learning to rank method. Three classes of features comprising of linguistic, topical and temporal features were used to estimate document relevance in the learning to rank approach.
WIS @ the NTCIR-12 Temporalia-2 Task
Yue Zhao and Claudia Hauff
[Pdf] [Table of Content]

Users' time-related information can may be multi-faceted, leading to temporal intent ambiguity. Here, we present an overview of our submissions to the Temporalia-2's Temporal Intent Disambiguation subtask. Our approach focused on the question of whether temporal signals, extracted from publicly available, external data sources (in this case the Wikipedia page view stream), as features in a machine learning setup are beneficial for this task. Although we find that for some queries, temporal features can be highly beneficial for intent prediction, this is not the case for the majority of queries in Temporalia-2's dataset.
KGO at the NTCIR-12 Temporalia Task: Exploring Temporal Information in Search Queries
Xin Kang, Yunong Wu and Fuji Ren
[Pdf] [Table of Content]

This paper details our partition in the Temporal Intent Disambiguation (TID) English subtask of the NTCIR-12 Temporalia Task. Our work focuses on the development of a series of temporal features in web search queries and the construction of a deep neural network for disambiguating people's temporal intents in web searches. We analyze the importance in different temporal features and discuss the impact of neural network structures to the TID results.
DUT-NLP-CH @ NTCIR-12 Temporalia TID Subtask
Jiahuan Pei, Degen Huang, Jianjun Ma, Dingxin Song and Leyuan Sang
[Pdf] [Table of Content]

This paper details our participation in Temporal Intent Disambiguation Subtask of NTCIR-12. In this paper, we take the subtask as a problem of classification and our major job is finding some distinguishable features to feed machine learning classifiers to estimate a distribution of four temporal intent classes for a given query. Considering lack of the temporal information in queries, we expand it by two types of features: explicit and implicit time gap features extracted from the context of queries and time series analysis using Google Trends respectively. Also, some textual features, such as word-based probability distribution, temporal trigger words and length of query, were adopted to classifiers with probability estimation. Finally, we select best three classifiers to get the submitted runs, where our best result is 0.8135 of AvgCosin measure and 0.1710 of AvgAbsLoss measure. After submission of formal run, we do further research and get better result as 0.8886 of AvgCosin measure and 0.1286 of AvgAbsLoss measure.
TUTA1 at the NTCIR-12 Temporalia Task
Ning Liu, Mengjia He, Chao Li, Xin Kang and Fuji Ren
[Pdf] [Table of Content]

Our group submitted task for Temporal Intent Disambiguation (TID) Subtask (Chinese) of NTCIR-2012. We using word2vec to model query String into feature vector, and using cos function to measure the similarity between query string and training corpus SougouCA. Our results shows the approach is ecient for solving thoes Task.
NTCIR-12: Temporal Intent Disambiguation Subtask: Naive Bayesian Classifier to Predict Temporal Classes
Gandhimathi Moharasan and Ho Tu Bao
[Pdf] [Table of Content]

The Holab team from Japan Advanced Institute of Science and Technology(JAIST) participated in NTCIR-12:Temporal Intent Disambiguation(TID) subtask. Objective of this task is to predict temporal classes of a query which is extended from NTCIR-11 Temporal Intent Query Classification(TIQC) subtask. We proposed most famous Naive-Bayes classifier to accomplish our objective. In TID subtask, firstly, we generated different level of features from given query, later we used classifier to calculate the distribution and classify the temporal classes. In this report,we discussed about varies features, that have been used to estimate the probability distribution of four temporal intent classes (Atemporal, Past, Recent, or Future) under the temporal intent disambiguation subtask. Also we discussed about experimental results and comparative analysis with other systems submitted by different participants.
Using Time-Series for Temporal Intent Disambiguation in NTCIR-12 Temporalia
Dan Li, Xiaoxia Liu, Yunxia Zhang, Degen Huang and Jingxiang Cao
[Pdf] [Table of Content]

Our group DUT-NLP-EN participated in the TID subtask of NTCIR-12 Temporalia and submitted three runs. The temporal intent probability distribution of four categories for the 300 test queries are predicted through logistic regression in all the three runs. In RUN1, four groups of features are used including trigger word, word POS, explicit time gap, temporal probability of words. Implicit time gap is added in the form of rule-based time gap in RUN2 and in the form of time-series statistics in RUN3. RUN2 performs slightly better than the rest two runs with AveCosin of 0.732 and AvgAbsLoss of 0.210.
IRISM @ NTCIR-12 Temporalia Task: Experiments with MaxEnt, Naive Bayes and Decision Tree Classifiers
Jitendra Kumar, Sudha Shanker Prasad and Sukomal Pal
[Pdf] [Table of Content]

This paper describes our participation in Temporal Intent Disambiguation (TID), which is a subtask of the pilot task of NTCIR'12 Temporal Information Access (Temporalia-2) task. We considered the task as a slight variation of supervised machine learning classification problem. Our strategy involves building models on different standard classifiers based on probabilistic and entropy models from MALLET, a Natural Language Processing tool. We focus on the feature engineering to predict the probability distribution of given temporal classes for search queries. We submitted three runs based on MaxEnt, Naive Bayes and C4.5 Decision Tree classifiers. Out of them, Decision Tree based runs exhibited our best performance while the other two were average.
GIR at the NTCIR-12 Temporalia Task
Long Chen, Haitao Yu, Fajie Yuan and Joemon M Jose
[Pdf] [Table of Content]

The GIR team participated in the NTCIR 12 Temporal Information Access (Temporalia) Task. This report describes our approach to solving the Temporal Intent Disambiguation (TID) problem and discusses the official results. We explore the rich temporal information in the labeled and unlabeled search queries. A semi-supervised linear classifiers is then built up to predict the temporal classes for each search query.
KDETM at NTCIR-12 Temporalia Task: Combining a Rule-based Classifier with Weakly Supervised Learning for Temporal Intent Disambiguation
Abu Nowshed Chy, Md Zia Ullah, Md Shajalal and Masaki Aono
[Pdf] [Table of Content]

Web is gigantic and being constantly update. Everyday lots of users turn into websites for their information needs. As search queries are dynamic in nature, recent research shows that considering temporal aspects underlying a query can improve the retrieval performance significantly. In this paper, we present our approach to address the Temporal Intent Disambiguation (TID) subtask of the Temporalia track at NTCIR-12. Given a query, the task is to estimate the distribution of four temporal intent classes including Past, Recency, Future, and Atemporal based on its contents. In our approach, we combine a rule-based classifier with weakly supervised classifier. We define a set of rules for the rule-based classifier based on the temporal distance, temporal reference, and POS-tag detection, whereas a small set of query with their temporal polarity knowledge are applied to train the weakly supervised classifier. To train the weakly supervised classifier, we use the bag-of-words feature and TF-IDF score as a feature weight. Experimental results show that our system reaches the competitive performance among the participants in Temporalia task.
WHUIR at the NTCIR-12 Temporal Intent Disambiguation Task
Sisi Gui and Wei Lu
[Pdf] [Table of Content]

WHUIR participated in the Temporal Intend Disambiguation (TID) Task of the Temporalia track at NTCIR-12. This paper describes our work of this specific subtask. Given a query, the task is to assign the probability value to four temporal classes i.e. Past, Recency, Future or Atemporal. Our overall strategy has been to rely on established off-the-shelf components (e.g., standard classifiers from LIBSVM and natural language processing methods from Stanford CoreNLP) and focus on feature discovering. We considered nineteen features in total from query itself. We used all the features for SVR in different parameter sets and chose the best three sets on the dry run data for the formal run. Results are presented and discussed in this paper.
KYOTO at the NTCIR-12 Temporalia Task: Machine Learning Approach for Temporal Intent Disambiguation Subtask
Tomohiro Sakaguchi and Sadao Kurohashi
[Pdf] [Table of Content]

This paper describes the Kyoto system for Temporal Intent Disambiguation (TID) subtask in the NTCIR-12 Temporal Information Access (Temporalia-2) challenge. The task is to estimate the distribution of temporal intents (Past, Recency, Future, Atemporal) of a given query. We took a supervised machine learning approach, using features of bag of words, POS and word vectors. We also incorporated knowledge about temporal and holiday expressions. Our system resulted in a competitive performance.
HITSZ-ICRC at NTCIR-12 Temporal Information Access Task
Yongshuai Hou, Cong Tan, Xiaolong Wang, Jun Xu and Qingcai Chen
[Pdf] [Table of Content]

This paper presents the methods HITSZ-ICRC group used to Temporalia-2 task at NTCIR-12, including subtask Temporal Intent Disambiguation (TID) and subtask Temporal Diversified Retrieval (TDR). In the TID subtask, we merged results of rule based method and word temporal intent classes vector based method to estimate temporal intent classes distribution on English queries and Chinese queries. The rule based method was improved from the method we used in Temporalia-1. The word temporal intent classes vector based method estimated temporal intent classes distribution by normalizing the sum of temporal intent classes vectors of all words in the query. In the TDR subtask, for the temporal information retrieval, we used TIR system in Temporal-1 to get ranked documents list for each temporal subtopic; for the temporally diversified ranking, we used all documents in result lists of the four temporal subtopics as candidate documents set for a query topic, and ranked each document in the candidate set based on: the document relevant score to each subtopic, the temporal intent classes of temporal expressions in each document and the temporal information of previous ranked documents for the topic. We only tried our methods for TDR subtask on English topics.

Return to Top

[MathIR]
NTCIR-12 MathIR Task Overview
Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topić and Kenny Davila
[Pdf] [Table of Content]

This paper presents an overview of the NTCIR-12 MathIR Task, dedicated to information access for mathematical content. The MathIR task makes use of two corpora. The first corpus contains technical articles from the arXiv, while the second corpus contains English Wikipedia articles. For each corpus, there were two subtasks; one for queries containing keywords and formulae (arXiv-main and Wiki-main), and the other a query-by-expression task for isolated formula queries (arXiv-simto and Wiki-formula). In this overview paper, we summarize the task design, corpora, submitted runs, and approaches used by participating groups.
Exploring the One-brain Barrier: A Manual Contribution to the NTCIR-12 MathIR Task
Moritz Schubotz, Norman Meuschke, Marcus Leich and Bela Gipp
[Pdf] [Table of Content]

This paper compares the search capabilities of a single human brain supported by the text search built into Wikipedia with state-of-the-art math search systems. To achieve this, we compare results of manual Wikipedia searches with the aggregated and assessed results of all systems participating in the NTCIR-12 MathIR Wikipedia Task. For 26 of the 30 topics, the average relevance score of our manually retrieved results exceeded the average relevance score of other participants by more than one standard deviation. However, math search engines at large achieved better recall and retrieved highly relevant results that our esingle-brain systemf missed for 12 topics. ..
The Math Retrieval System of ICST for NTCIR-12 MathIR Task
Liangcai Gao, Ke Yuan, Yuehan Wang, Zhuoren Jiang and Zhi Tang
[Pdf] [Table of Content]

This paper is the summarized experiences of ICST team in the NTCIR-12 MathIR main tasks (ArXiv and Wikipedia main task). Our approach is based on keyword, structure and importance of formulae in a document. A novel hybrid indexing and matching model is proposed to support exact and fuzzing matching. In this hybrid model, both keyword and structure information of formulae are taken into consideration. In addition, the concept of formula importance within a document is introduced into the model. In order to make the ranking results more reasonable, our system (WikiMir) applies the learning to rank algorithm (RankBoost) to rank the retrieved formulae , and then re-ranks the top-k formulae by the regular expressions matching of the query formula. The experimental results show that the method of our system is effective for all metrics and promising in practical application.
MCAT Math Retrieval System for NTCIR-12 MathIR Task
Giovanni Yoko Kristianto, Goran Topić and Akiko Aizawa
[Pdf] [Table of Content]

This paper describes the participation of our MCAT search system in the NTCIR-12 MathIR Task. We introduce three granularity levels of textual information, new approach for generating dependency graph of math expressions, score normalization, cold-start weights, and unification. We find that these modules, except the cold-start weights, have a very good impact on the search performance of our system. The use of dependency graph significantly improves precision of our system, i.e., up to 24:52% and 104:20% relative improvements in the Main and Simto subtasks of the arXiv task, respectively. In addition, the implementation of unification delivers up to 2:90% and 57:14% precision improvements in the Main and Simto subtasks, respectively. Overall, our best submission achieves P@5 of 0.5448 in the Main subtask and 0.5500 in the Simto subtask. In the Wikipedia task, our system also performs well at the MathWikiFormula subtask. At the MathWiki subtask, however, due to a problem with handling queries formed as questions that contain many stop words, our system finishes second.
Math Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies
Michal Růžička, Petr Sojka and Martin Líška
[Pdf] [Table of Content]

This paper summarizes the experience of Math Information Retrieval team of Masaryk University (MIRMU) with the NTCIR-12 MathIR arXiv Main Task and its subtasks. We based our approach on the MIaS system. Based on NTCIR-11 Math-2 Task relevance judgements, we developed an evaluation platform. Using this platform we rigorously evaluated combinations of new features and picked the most promising ones for the NTCIR-12 evaluation. The new features tested are mostly aimed at further canonicalizing MathML input, structurally unifying formulae for syntactic-based similarity search and query expansion when combining text and math query terms.
Tangent-3 at the NTCIR-12 MathIR Task
Kenny Davila, Richard Zanibbi, Andrew Kane and Frank Wm. Tompa
[Pdf] [Table of Content]

We present the math-aware search engine Tangent-3 and report its results for the NTCIR-12 MathIR task. Tangent uses a federated search over two indices: 1) a TF-IDF textual search engine (Solr), and 2) a query-by-expression engine. We use an inverted index to store math expressions using pairs of symbols extracted from a Symbol Layout Tree representation built from Presentation MathML. We use a cascade model with two stages for retrieval. In the first stage, relevant expressions are retrieved quickly using iterator trees over posting lists to find matches and expressions are ranked using the Dice coefficient of matched symbol pairs. In the second stage, the top-k best candidates are reranked with a more strict similarity metric supporting unification and wildcard matching. Our system produces relevant (and partially relevant) Precision@5 values of 21% (50%) for the main arXiv task, 25% (49%) for the Main Wikipedia subtask and 45% (84%) for the Wikipedia Formula Browsing subtask.
A Document Retrieval System for Math Queries
Abhinav Thanda, Ankit Agarwal, Kushal Singla, Aditya Prakash and Abhishek Gupta
[Pdf] [Table of Content]

We present and analyze the results of our Math search system in the MathIR tasks in the NTCIR-12 Information Retrieval challenge. The Math search engine in the paper utilizes the co-occurrence finding technique of LDA and doc2vec to bring more contextual search. Additionally, it uses common patterns to improve the search output. To combine various scoring algorithms, it uses hybrid ranking mechanism to re-rank the documents retrieved from Elastic Search. In this paper, we evaluate the results from these algorithms and present possible future work for further improvements.

Return to Top

Pilot Tasks

[Lifelog]
Overview of NTCIR-12 Lifelog Task
Cathal Gurrin, Hideo Joho, Frank Hopfgartner, Liting Zhou and Rami Albatal
[Pdf] [Table of Content]

In this paper we review the NTCIR12-Lifelog pilot task, which ran at NTCIR-12. We outline the test collection employed, along with the tasks, the eight submissions and the findings from this pilot task. We finish by suggesting future plans for the task.
LIG-MRIM at NTCIR-12 Lifelog Semantic Access Task
Bahjat Safadi, Philippe Mulhem, Georges Quénot and Jean-Pierre Chevallet
[Pdf] [Table of Content]

This paper describes the participation of the MRIM research team to the Lifelog Semantic Access subtask of the NTCIR-12. Our approach mainly relies on mapping the query terms to visual concepts computed on the Lifelogs images according to two separated learning schemes. A post processing is then achieved if the topic is related to temporal, location or activity of associated with the images. The results obtained are promising for a first participation of such task, with event-based MAP above 29\% and an event-based nDCG value close to 39\%.
LEMoRe: A Lifelog Engine for Moments Retrieval at the NTCIR-Lifelog LSAT Task
Gabriel de Oliveira Barra, Alejandro Cartas Ayala, Marc Bolaños, Mariella Dimiccoli, Xavier Giro-i-Nieto and Petia Radeva
[Pdf] [Table of Content]

Semantic image retrieval from large amounts of egocentric visual data requires to leverage powerful techniques for filling in the semantic gap. This paper introduces LEMoRe, a Lifelog Engine for Moments Retrieval, developed in the context of the Lifelog Semantic Access Task (LSAT) of the the NTCIR-12 challenge and discusses its performance variation on different trials. LEMoRe integrates classical image descriptors with high-level semantic concepts extracted by Convolutional Neural Networks (CNN), powered by a graphic user interface that uses natural language processing. Although this is just a first attempt towards interactive image retrieval from large egocentric datasets and there is a large room for improvement of the system components and the user interface, the structure of the system itself and the way the single components cooperate are very promising.
Image Searching by Events with Deep Learning for NTCIR-12 Lifelog
Hsiang-Lun Lin, Tzu-Chieh Chiang, Liang-Pu Chen and Ping-Che Yang
[Pdf] [Table of Content]

We construct a automatically system to participate Lifelog task in NTCIR-12 that find the image out correctly by events. In our system, We have employed deep learning method to approach the targets. In our processing, we use stanford parser and named-entities recognition method to process text of events. Also, we employ word to vector toolkit to transfer words to vectors.Moreover, we construct a model ,training it with word2vec, to calculate the correlation between each search task and image. By using this model, we find relevant images from every task in this topic.
QUT at the NTCIR Lifelog Semantic Access Task
Harrisen Scells, Guido Zuccon and Kirsty Kitto
[Pdf] [Table of Content]

This notebook paper describes the submissions to the 2016 NTCIR Lifelog Semantic Access Task made by the Queens- land University of Technology (QUT).
VTIR at the NTCIR-12 2016 Lifelog Semantic Access Task
Long Xia, Yufeng Ma and Weiguo Fan
[Pdf] [Table of Content]

The VTIR team participated in the Lifelog Semantic Access Task of the NTCIR-12 2016. We proposed an approach to pre-process the visual concepts of the photos and reconstruct queries from two aspects. Due to limited time, our efforts were constrained. Our main contribution was to learn and incorporate location of each photo to improve search accuracy. The evaluation demonstrated that our approach achieved pretty decent results with querying simple scenarios, while a more sophisticated model is needed for search complicated scenario, including range search. This report describes our approach to improve retrieval accuracy and discusses the results obtained, and proposes some possible improvements that can be made in the future.
Repeated Event Discovery from Image Sequences by Using Segmental Dynamic Time Warping: Experiment at the NTCIR-12 Lifelog Task
Kosuke Yamauchi and Tomoyosi Akiba
[Pdf] [Table of Content]

In this paper, we report on examining in applying of Spoken Term Discovery (STD) to lifelog images. STD is an approach to discover words which repeated in multiple speeches. We used an approach based on Dynamic Time Warping to discover patterns which are repeated in lifelog images. If this approach gives meaningful patterns, the results will be helpful information for lifelog researches (e.g. segmentation, clustering). We evaluated whether this approach extracts meaningful patterns. As a result, we found that it has potential for lifelog researches.
Visual Insights from Personal Lifelogs: Insight at the NTCIR-12 Lifelog LIT Task
Aaron Duane, Rashmi Gupta, Liting Zhou and Cathal Gurrin
[Pdf] [Table of Content]

In this paper we describe the Insight Centre for Data Analytics participation in the LIT sub-task of the Lifelog task at NTCIR-12. We present the interactive lifelog visualisation tool that we developed specifically for this task, which allowed us to interrogate the dataset to create insights for six LIT topics.
SLLL at the NTCIR-12 Lifelog Task: Sleepflower and the LIT Subtask
Satomi Iijima and Tetsuya Sakai
[Pdf] [Table of Content]

SLLL (Waseda University Sakai Laboratory LifeLog team) is working on a prototype smartphone application called Sleepflower, which is designed to improve the sleep cycles of a group of users through a collaborative effort. A flower metaphor is displayed on the smartphone screen to represent the current sleepiness of a particular user, along with similar metaphors for the other group members, in the hope of improving the lifestyles of the group as a whole. One significant limitation of the current prototype is that sleep hours and sleepiness grades need to be entered manually; we are hoping to build a new prototype that semi-automaticallly collects lifelog data such as those provided by the NTCIR Lifelog task. As an initial step towards this goal, we manually analyse the NTCIR Lifelog image data from the viewpoint of individual sleeping habits and discuss possible approaches to leveraging such data for the next version of Sleepflower.

Return to Top

[QALab-2]
Overview of the NTCIR-12 QA Lab-2 Task
Hideyuki Shibuki, Kotaro Samamoto, Madoka Ishioroshi, Akira Fujita, Yoshinobu Kano, Teruko Mitamura, Tatsunori Mori and Noriko Kando
[Pdf] [Table of Content]

The NTCIR-12 QA Lab-2 task aims at the real-world complex Question Answering (QA) technologies using Japanese university entrance exams and their English translation on the subject of ``World history''.@The exam questions are roughly divided into multiple-choice and free-description styles, and have various question formats, which are essay, factoid, slot-filling, true-or-false and so on.@We conducted three phases of formal runs, and collaborated on Phase-2 Japanese subtask with the Todai Robot Project.@Twelve teams submitted 148 runs in total.@We describe the used data, the hierarchy of question formats, formal run results, and comparison between human marks and automatic evaluation measures for essay questions.
ISOFT-Team at NTCIR-12 QALab-2: Using Choice Verification
Soonchoul Kwon, Seonyeong Park, Daehwan Nam, Kyusong Lee, Hwanjo Yu and Gary Geunbae Lee
[Pdf] [Table of Content]

NTCIR QA-Lab is a task to handle real-world complex questions. We, ISOFT team, participate the task with the choice verification method. The choice verification method is to evaluate the truthiness of each choice by calculating three evidence scores using knowledgebase, information retrieval, and restriction. We use fundamental NLP methods without semantic analysis and minimize the need of manual tagging. We ranked 1st in Phase-1 (71/100) and 6th in Phase-3 (38/100) in QA-Lab 2. The errors are from nonexistence of named entity and semantic analysis.
NUL System at QA Lab-2 Task
Mio Kobayashi, Hiroshi Miyashita, Ai Ishii and Chikara Hoshino
[Pdf] [Table of Content]

This paper describes our strategies and the methods of NUL team on NTCIR-12 QA Lab-2 Japanese National Center Test tasks. We mainly use three strategies with four solvers. First strategy, we use Pointwise Mutual Information (PMI) and search results ranking to calculate the score of choices. Second, we convert True-or-False question to virtual factoid question by removing named entity. Third, we convert textbooks and questions to syntax tree and match them. We choose the final answer by aggregating ranks of each solver. Our system achieved 76 points in Benesse mock exam Jun 2015 (Pattern 1) of Phase 2.
SML Question-Answering System for World History Essay and Multiple-choice Exams at NTCIR-12 QA@Lab-2
Takuma Takada, Takuya Imagawa, Takuya Matsuzaki and Satoshi Sato
[Pdf] [Table of Content]

This paper describes SML team's approach to automatically answering world history exam questions at NTCIR-12 QALab. We challenged to answer both the multiple-choice questions in the national center tests and the essay-type questions in the secondary exams. Our system answers the questions by searching based on surface similarity. We explored several methods to enhance the system with domain-specic knowledge such as dictionaries of synonyms and temporal information.
IMTKU Question Answering System for World History Exams at NTCIR-12 QA Lab2
Min-Yuh Day, Cheng-Chia Tsai, Wei-Chun Chuang, Jin-Kun Lin, Hsiu-Yuan Chang, Tzu-Jui Sun, Yuan-Jie Tsai, Yi-Heng Chiang, Cheng-Zhi Han, Wei-Ming Chen, Yun-Da Tsai, Yi-Jing Lin, Yue-Da Lin, Yu-Ming Guo, Ching-Yuan Chien and Cheng-Hung Lee
[Pdf] [Table of Content]

In this paper, we describe the IMTKU (Information Management at Tamkang University) question answering system for Japanese university entrance exams at NTCIR-12 QA-Lab2. We proposed a question answering system using a hybrid approach that integrate natural language processing and machine learning techniques for Japanese university entrance exams at NTCIR-12 QA-Lab2. In phase-1, we submitted 6 End-to End QA run results for only English subtask for National Center Test for University Admissions and Secondary exams subtask. In phase-3, we submitted 7 End-to End QA run results for English and Japanese subtask for Nation Center Exams and Secondary exams subtask. In NTCIR-12 QA-Lab2 phase-1, the IMTKU team total score achieved 31, 27 and 0 in the English subtask. In NTCIR-12 QA-Lab2 phase-3, the IMTKU team total score achieved 20, 20 and 14 in the English subtask, 24, 12 and 8 in the Japanese subtask and 31 in a combination run with KitAi.
KitAi-QA: A Question Answering System for NTCIR-12 QALab-2
Shuji Fukamachi and Kazutaka Shimada
[Pdf] [Table of Content]

This paper describes a question answering system for NTCIR-12 QALab-2.?The task that we participated in is the Japanese task about National Center Test and Mock exams. Our method consists of two stages; a scoring method and answer selection methods for four question types. The scoring is to detect the evidence for the next process, namely answer selection, from textbooks.?We also focus on conflict detection and event detection for the answer selection of the True-or-False type question.?For other questions, Factoid, Slot-filling and Unique Time, our method judges or extracts the answer from the passage retrieved by the scoring method.?The accuracy of our method on the formal run was moderate. However, the result of our method sometimes boosted up other system results on the combination run. The result shows the effectiveness of our method.
KSU Team's Multiple Choice QA System at the NTCIR-12 QA Lab-2 Task
Tasuku Kimura, Ryosuke Nakata and Hisashi Miyamori
[Pdf] [Table of Content]

This paper describes the systems and results of the team KSU for QA Lab-2 task in NTCIR-12. In each phase at the task, we developed the three automatic answer systems that solved world history questions in The National Center Test. In the QA system using document retrieval, itfs important that the system estimates exact question types and uses knowledge sources and query generation methods in accordance with types so as to answer correctly. Therefore, we designed systems that focus on query generation using the underlined text and knowledge sources. Scores of the formal runs were 20 correct answers(0.49%) and 68 points with priolty-02 system in phase-1, 26 correct answers(0.41%) and 70 points with priolty-01 system in phase-2 and 14 correct answers(0.39%) and 38 points with priolty-01 system in phase-3.
ASEE: An Automated Question Answering System for World History Exams
Tao-Hsing Chang and Yu-Sheng Tsai
[Pdf] [Table of Content]

This study designed a system called ASEE, which can answer the multiple-choice items provided by the QALab-2 task in NTCIR-12 conference. This system adopts Wikipedia as its knowledge source, using the Stanford Parser to analyze the linguistic features of the items and retrieve key words; it then determines the probability of each option as the correct answer through an algorithm and finally selects the best one. Experimental results shows that the system can correctly answer 21 of 36 questions, which originated from World History B of the National Center Test for University Admissions in Japan in 2011.
CMUQA: Multiple-Choice Question Answering at NTCIR-12 QA Lab-2 Task
Dheeru Dua, Bhawna Juneja, Sanchit Agarwal, Kotaro Sakamoto, Di Wang and Teruko Mitamura
[Pdf] [Table of Content]

The first version of the UIMA-based modular automatic question answering (QA) system was developed for NTCIR- 11 QA Lab task. The system answers multiple-choice English questions for the Japanese university entrance examinations on the subject of world history. We made improvements in the current system by adding components focused towards Source Expansion and better Semantic Understand- ing of the question in terms of events and their time-lines.
SLQAL at the NTCIR-12 QALab-2 Task
Shin Higuchi and Tetsuya Sakai
[Pdf] [Table of Content]

SLQAL (Waseda University Sakai Laboratory QALab team) participated in Phase-1 and Phase-3 of the Japanese subtask of NTCIR-12 QALab-2. This paper briefly describes our approaches. Our runs scored 25 points in Phase 1 and 35 points in Phase 3. An initial failure analysis shows that?our system performs particularly poorly for Type-T questions and question that require time expression processing. This work was done as a bachelor's thesis of the first author of this paper.
WUST System at NTCIR-12 QALab-2 Task
Maofu Liu, Limin Wang, Xiaoyi Xiao, Lei Cai and Han Ren
[Pdf] [Table of Content]

This paper describes our question answering system in NTCIR-12 on QALab-2 subtask, which requires solving the history questions of Japanese university entrance exams and their English translation. English Wikipedia is main external knowledge base for our system. We first retrieve the articles and sentences related to the question from Wikipedia. Then, we construct the classification model based on support vector machine (SVM) in order to solve the question of choosing right or wrong sentence in multiple choice-type questions for the National Center Test; we extract five features about questions and choices as inputs to the model. Finally, we choose the answer according to the score of each choice.
Multi-choice Question Answering System of WIP at the NTIR-12 QA Lab-2
Bingfeng Luo, Yuxuan Lai, Lili Yao, Yansong Feng and Dongyan Zhao
[Pdf] [Table of Content]

This paper describes a multi-choice question answering system we designed for the NTCIR-12 QALab. This system aims at analysing and answering world history multi-choice questions in the Japanese National Center Test (in English). Our system utilizes preliminary results from an information retrieval baseline as a starting point, and improves by taking structured knowledge base as well as additional time constraints into consideration. In the final evaluation, we achieved 34 points on the 2011 test dataset.
Forst: Question Answering System for Second-stage Examinations at NTCIR-12 QA Lab-2 Task
Kotaro Sakamoto, Madoka Ishioroshi, Hyogo Matsui, Takahisa Jin, Fuyuki Wada, Shu Nakayama, Hideyuki Shibuki, Tatsunori Mori and Noriko Kando
[Pdf] [Table of Content]

Question answering is widely regarded as an advancement in information retrieval.?However, QA systems are not as popular as search engines in the real world. In order to apply QA systems to real-world problems we tackle the QA-Lab task dealing with Japanese university entrance exams of world history. Japanese university entrance exams have the following two stages: The National Center Test (multiple choice-type questions) and second-stage exams (complex questions including essays). Essay questions have the following two types: complex essay type (about 20 lines) and simple essay type (about 2-3 lines). At NTCIR-12 QA Lab-2 task, we focused on the second-stage exams, especially complex essay questions.

Return to Top

[STC: Short Text Conversation]
Overview of the NTCIR-12 Short Text Conversation Task
Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka and Yusuke Miyao
[Pdf] [Table of Content]

We describe an overview of the NTCIR-12 Short Text Conversation (STC) task, which is a new pilot task of NTCIR-12. STC consists of two subtasks: a Chinese subtask using post-comment pairs crawled from Weibo, and a Japanese subtask providing the IDs of such pairs from Twitter. Thus, the main difference between the two subtasks lies in the sources and languages of the test collections. For the Chinese subtask, there were a total of 38 registrations, and 16 of them finally submitted 44 runs. For the Japanese subtask, there were 12 registrations in total, and 7 of them submitted 25 runs. We review in this paper the task definition, evaluation measures, test collections, and the evaluation results of all teams.
UWNLP at the NTCIR-12 Short Text Conversation Task
Anqi Cui, Guangyu Feng, Borui Ye, Kun Xiong, Xing Yi Liu and Ming Li
[Pdf] [Table of Content]

In this paper, we describe our submission to the NTCIR-12 Short Text Conversation task. We consider short text conversation as a community Question-Answering problem, hence we solve this task in three steps: First, we retrieve a set of candidate posts from a pre-built indexing service. Second, these candidate posts are ranked according to their similarity with the original input post. Finally, we rank the comments to the top-ranked posts and output these comments as answers. Two ranking models and three comment selection strategies have been introduced to generate five runs. Among them, our best approach receives performances of mean nDCG@1 0.2767, mean P+ 0.4284 and mean nERR@10 0.4095.
USTC at NTCIR-12 STC Task
Junbei Zhang, Junfeng Hou, Shiliang Zhang and Lirong Dai
[Pdf] [Table of Content]

In this paper, we describe the system submitted by USTC team for the Short Text Conversation (STC) task of the NTCIR-12. We proposed transition-p2c, encoder-decoder-Reverse and joint-Train models for the STC task and submitted 5 official runs. The transition-p2c model provides transition probability between post and comment in word's level which complements the TF-IDF feature. The encoder-decoder-Reverse and joint-Train model provide semantic similarity between post and comment. With the help of these models, we achieved 0.2867 on Mean nDCG@1, 0.4509 on Mean P+ and 0.4181 on Mean nERR@10.
The splab at the NTCIR-12 Short Text Conversation Task
Ke Wu, Xuan Liu and Kai Yu
[Pdf] [Table of Content]

The splab team participated in the Chinese subtask of the NTCIR-12 on Short Text Conversation Task. This task assumes that the existing comments in a post-comment repository can be reused as suitable responses to a new short text. Our task is to return 10 most appropriate comments to such a short text. In our system, we attempt to employ advanced IR methods and the recent deep learning techniques to tackle the problem. We develop a three-tier ranking framework to promote the most suitable comments in top position as much as possible. It consists of three components, i.e., search, lexical ranking and semantic ranking. In the search component, three different query generation methods are employed to boost the system's recall. In the lexical ranking, we exploit the training data of labelled post-comment pairs to score the comments in the candidate pool. In the final semantic ranking, we apply the deep learning techniques to convert the comment string or a short text string to a continuous, low-dimensional feature vector, re-score the final candidate comments and provide the 10 most reasonable comments to a short text. The evaluation of submitted results empirically shows our framework is effective in terms of mean nDCG@1, mean P+ and mean nERR@10.
ITNLP: Pattern-based Short Text Conversation System at NTCIR-12
Yang Liu, Chengjie Sun, Lei Lin and Xiaolong Wang
[Pdf] [Table of Content]

This paper describes the ITNLP system participated in the Short Text Conversation (STC Chinese subtask) of the NTCIR-12. We employed a Logistic Regression Model combined with pattern-based matching features to solve the STC problem. Deep learning methods were also tried in our experiment. Out of the 44 submitted runs our best performance run ITNLP-C-R3 ranked 8th(Mean nDCG@1),12th(Mean P+) and 9th(Mean nERR@10) respectively.
Microsoft Research Asia at NTCIR-12 STC Task
Zhongxia Chen, Ruihua Song and Xing Xie
[Pdf] [Table of Content]

This paper describes our approaches at NTCIR-12 short text conversation (STC) task (Chinese). For a new post, instead of considering post-comment similarity, our system focus on finding similar posts in the repository and retrieve their corresponding comments. Meanwhile, we choose frequency property of comments to adjust ranking models. Our best run achieves 0.4854 for mean P+, 0.3367 for mean nDCG@1 and 0.4592 for mean nERR@10, which reaches the top tier in official STC results.
BUPTTeam Participation in NTCIR-12 Short Text Conversation Task
Yongmei Tan, Minda Wang and Songbo Han
[Pdf] [Table of Content]

This paper provides an overview of BUPTTeamfs system participated in the Short Text Conversation (STC) task of Chinese at NTCIR-12. STC is a new NTCIR challenging task which is defined as an IR problem, i.e., retrieval based a repository of post-comment pairs from Sina Weibo. In this paper, we propose a novel method to retrieve post result from the repository based on the following four steps: 1) preprocessing, 2) building search index, 3) comment candidates generation, 4) comment candidates ranking. The evaluation results show that our method significantly outperforms state-of-the-art STC Chinese task.
OKSAT at NTCIR-12 Short Text Conversation Task: Priority to Short Comments, Filtering by Characteristic Words and Topic Classification
Takashi Sato, Yuta Morishita and Shota Shibukawa
[Pdf] [Table of Content]

Our group OKSAT submitted five runs for Chinese and Japanese subtasks of the NTCIR-12 Short Text Conversation task (STC). We searched not only posts but also comments for terms of each query (post). We also gave more priority to short comments than longer ones. Then we filtered retrieved comments by characteristic words including proper nouns. We added attributes to the corpus and also to the queries. The retrieved comments, which had the same attributes as a query, got an extra score. We classified the queries into three classes for the Japanese subtask, and expanded and searched terms differently. Analyzing experimental results, we observed the effectiveness of our method.
Scoring of Response Based on Suitability of Dialogue-act and Content Similarity
Sota Matsumoto and Masahiro Araki
[Pdf] [Table of Content]

We present an approach to scoring candidate utterances in a large repository of short text conversation (STC) data to select those to be used as a suitable response to a newly given utterance. Candidate utterances are evaluated based on the suitability of a dialogue-act and the content similarity. The estimation of the suitability of a dialogue-act is implemented by learning the trend of a dialogue-act pair that frequently appears in the repository. Also, we calculated the content similarity between utterances by means of the cosine similarity of topic vectors using LDA and IDF. By multiplying these values, those candidates which are suitable in terms of function and content attain a high score. As a result of the experimental evaluation, for content similarity, it was found that increasing the weighting of the IDF produces a better accuracy.
UT Dialogue System at NTCIR-12 STC
Shoetsu Sato, Shonosuke Ishiwatari, Naoki Yoshinaga, Masashi Toyoda and Masaru Kitsuregawa
[Pdf] [Table of Content]

This paper reports a dialogue system developed at the Uni- versity of Tokyo for participation in NTCIR-12 on the short text conversation (STC) pilot task. We participated in the Japanese STC task on Twitter and built a system that se- lects plausible responses for an input post (tweet) from a given pool of tweets. Our system first selects a (small) set of tweets as response candidates from the pool of tweets by exploiting a kernel-based classifier. The classifier uses bag- of-words in an utterance and a response (candidate) as fea- tures. We then perform re-ranking of the chosen candidates in accordance with the perplexity given by Long Short-Term Memory-based Recurrent Neural Network (lstm-rnn) to re- turn a ranked list of plausible responses. In order to capture the diversity of domains (topics, wordings, writing style, etc.) in chat dialogue, we train multiple lstm-rnns from subsets of utterance-response pairs that are obtained by clustering of distributed representations of the utterances, and use the lstm-rnn that is trained from the utterance- response cluster whose centroid is the closest to the input tweet.
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task
Kozo Chikai and Yuki Arase
[Pdf] [Table of Content]

According to rise of social networking services, short text like micro-blogs has become a valuable resource for practical applications. When using text data in applications, similarity estimation between text is an important process. Conventional methods have assumed that an input text is sufficiently long such that we can rely on statistical approaches, e.g., counting word occurrences. However, micro-blogs are much shorter; for example, tweets posted to Twitter are restricted to have only 140 character long. This is critical for the conventional methods since they suffer from lack of reliable statistics from the text. In this study, we compare the state-of-the-art methods for estimating text similarities to investigate their performance in handling short text, specially, under the scenario of short text conversation. We implement a conversation system using a million tweets crawled from Twitter. Our system also employs supervised learning approach to decide if a tweet can be a reply to an input, which has been revealed effective as a result of the NTCIR-12 Short Text Conversation Task.
ICL00 at the NTCIR-12 STC Task: Semantic-based Retrieval Method of Short Texts
Weikang Li, Yixiu Wang and Yunfang Wu
[Pdf] [Table of Content]

We take part in the short text conversation task at NTCIR-12. We employ a semantic-based retrieval method to tackle this problem, by calculating textual similarity between posts and comments. Our method applies a rich-feature model to match post-comment pairs, by using semantic, grammar, n-gram and string features to extract high-level semantic meanings of text.
WUST System at NTCIR-12 Short Text Conversation Task
Maofu Liu, Yifan Guo, Yang Wu, Limin Wang and Han Ren
[Pdf] [Table of Content]

Our WUST team has participated in the Chinese subtask of the NTCIR-12 STC (Short Text Conversation) Task. This paper describes our approach to the STC and discusses the official results of our system. Our system constructs the model to find the appropriate comments for the query derived from the given post. In our system, we hold the hypothesis that the relevant posts tend to have the common comments. Given the query q, the topic words firstly are extracted from q, and the initial set of post-comment pairs retrieved, and then used to match and rank to produce the final ranked list. The core of the system is to calculate the similarity between the responses and the given query q. The experimental results using the NTCIREVAL tool suggest that our system should be improved by combining with related knowledge and features.
Nders at the NTCIR-12 STC Task: Ranking Response Messages with Mixed Similarity for Short Text Conversation
Ge Xu and Guifang Lu
[Pdf] [Table of Content]

Short Text Conversation (STC) is a typical scenario in man-machine conversation, which simplifies the conversation into one round interaction and makes the related tasks more practical. This paper presents a simple approach to the Chinese STC task issued by NTCIR-12. Given a repository of post-comment pairs, for any query, we define three types of similarity and merged them according to empirical weights. We consider the similarity between a query and a post/comment. To catch more logic relevance, we train a LDA model to map a query/comment into a topic space, and then calculate the similarity between them. The evaluation results show that our approach performs better than average. Considering its simplicity, our approach can be used in quickly deploying related STC systems.
CYUT Short Text Conversation System for NTCIR-12 STC
Shih-Hung Wu, Wen-Feng Shih, Liang-Pu Chen and Ping-Che Yang
[Pdf] [Table of Content]

In this paper, we report how we build the system for Chinese subtask in NTCIR12 Short Text Conversation (STC) shared task. Our approach is to find the most related sentences for a given input sentence. The system is implemented based on the Lucene search engine. The result shows that our system can deal with the conversation that involves related sentences.
A Combination of Similarity and Rule-based Method of PolyU for NTCIR-12 STC Task
Chuwei Luo and Wenjie Li
[Pdf] [Table of Content]

In this report, we describe the approach we use in NTCIR-12 Short Text Conversation task. Because we register this task too late and we only have less than one week to do this task, we design a simple approach that is based on cosine similarity of sentence and some handcrafted rules. The result shows the effectiveness of our method.
SLSTC at the NTCIR-12 STC Task
Hiroto Denawa, Tomoaki Sano, Yuta Kadotami, Sosuke Kato and Tetsuya Sakai
[Pdf] [Table of Content]

The SLSTC team participated in the NTCIR-12 Short Text Conversation (STC) task. This report describes our approach to solving the STC problem and discusses the official results.
Utterance Selection Based on Sentence Similarities and Dialogue Breakdown Detection on NTCIR-12 STC Task
Hiroaki Sugiyama
[Pdf] [Table of Content]

This paper describes our contribution for the NTCIR-12 STC Japanese task. The purpose of the task is to retrieve tweets that suits as responses of a chat-oriented dialogue system from a huge number of tweets pool. Our system retrieves tweets based on following two steps: first it retrieves tweets that resemble to input sentences, and then, it filters inappropriate tweets in terms of the dialogue flow naturalness using a dialogue breakdown detection system. Our experiments show that although the dialogue breakdown detection cannot distinguish best and medium appropriateness, it works well even in data domains that are slightly different from expected ones.
YUILA at the NTCIR-12 Short Text Challenge: Combining Twitter Data with Dialogue System Logs
Hiroshi Ueno, Takuya Yabuki and Masashi Inoue
[Pdf] [Table of Content]

The YUILA team participated in the Japanese subtask of the NTCIR-12 Short Text Challenge task. This report describes our approach to solving the responsiveness problem in STC task by using external dialogue log corpus and discusses the official results.

Return to Top

Abstracts

Core Tasks

Pilot Tasks