NTCIR (NII Testbeds and Community for Information access Research) Project NTCIR | CONTACT INFORMATION | NII

# Abstracts

 Overview | Lifelog-2 | MedWeb | OpenLiveQ  QALab-3 | STC-2 | AKG | ECA | NAILS | WWW

### [Overview]

• Overview of NTCIR-13
Makoto P. Kato and Yiqun Liu
[Pdf] [Table of Content]
This is an overview of NTCIR-13, the thirteenth sesquiannual research project for evaluating information access technologies. NTCIR-13 presents a diverse set of evaluation tasks related to information retrieval, question answering, natural language processing, etc (in total, nine tasks have been organized at NTCIR-13). This paper describes an outline of the research project, which includes its organization, schedule, scope and task designs. In addition, we introduce brief statistics of participants in the NTCIR-13 Conference. Readers should refer to individual task overview papers for their detailed descriptions and findings.

### [Lifelog-2]

• Overview of NTCIR-13 Lifelog-2 Task
Cathal Gurrin, Hideo Joho, Frank Hopfgartner, Liting Zhou, Duc-Tien Dang-Nguyen, Rashmi Gupta and Rami Albatal
[Pdf] [Table of Content]
In this paper we review the NTCIR13-Lifelog core task, which ran at NTCIR-13. We outline the test collection em- ployed, along with the tasks, the submissions and the find- ings from this pilot task. We finish by suggesting future plans for the task.

• PBG at the NTCIR-13 Lifelog-2 LAT, LSAT, and LEST Tasks
Shuhei Yamamoto, Takuya Nishimura, Yasunori Akagi, Yoshiaki Takimoto, Takafumi Inoue and Hiroyuki Toda
[Pdf] [Table of Content]
In this paper, the participation of the PBG research team in the NTCIR-13 Lifelog LAT, LSAT, and LEST tasks is described. In common with these three subtasks, our team focuses on both images and locations and analyzes them with visual and location indexing methods. The results obtained demonstrated outstanding performance, and we clarified effective features for solving lifelog tasks.

• THIR2 at the NTCIR-13 Lifelog-2 Task: Bridging Technology and Psychology through the Lifelog Personality, Mood and Sleep Quality
Pouneh Soleimaninejadian, Yewen Wang, Haoyue Tong, Zehui Feng, Min Zhang, Yiqun Liu and Shaoping Ma
[Pdf] [Table of Content]
In this paper, we use lifelog data provided by NTCIR13 and voluntarily gathered lifelog data on other users to give insights in four psychological categories. These categories include study of big five personality traits, mood detection, music mood and style detection and sleep quality prediction. The results on big five personality traits, including five binary classifiers of openness to experience, conscientiousness, extraversion, agreeableness and neuroticism, is a five digit in base-2 numeral system. The classifications of mood and music style are based on Thayer’s two-dimensional model of mood. For sleep quality, we use three classes of high quality, borderline and poor quality sleep. To the best of our knowledge, this is the first research to link the physical data collected in lifelog and psychological analysis of user’s life. Our study shows encouraging results that existence of such kind of link is meaningful. We show the results predicted by our models and other statistics on mental health and psychological aspects of user’s life using our mental health insight tool.

• VCI2R at the NTCIR-13 Lifelog-2 Lifelog Semantic Access Task
Jie Lin, Ana Garcia Del Molino, Qianli Xu, Fen Fang, Vigneshwaran Subbaraju, Joo-Hwee Lim, Liyuan Li and Vijay Chandrasekhar
[Pdf] [Table of Content]
In this paper we describe our automatic approach for the NTCIR-13 Lifelog Semantic Access Task. The task is to query relevant lifelog images from a user’s daily life given an event topic. A major challenge is how to bridge the seman- tic gap between lifelog images and event-level topics. We propose a general framework to address the problem, with key components of various CNNs to translate lifelog images to object and scene features, relevant object/scene concepts searching for events, feature weighting adapted to events, and temporal smoothing to incorporate semantic coherence into the similarity between each image and query event. We achieved an official result 57.6% in terms of mean of pre- cisions over 20 topics. We also analyze the effect of key components to the retrieval system.

• Visualizing Personal Lifelog Data for Deeper Insights at the NTCIR-13 Lifelog-2 Task
Qianli Xu, Vigneshwaran Subbaraju, Ana Garcia Del Molino, Jie Lin, Fen Fang, Joo-Hwee Lim, Liyuan Li and Vijay Chandrasekhar
[Pdf] [Table of Content]
We present a method for finding insights from personal lifelogs. First, we create minute-wise annotation of the users’ activities with respect to the given topics (e.g. socialize, exercise, etc.). This is achieved by performing image retrieval using deep learning, followed by the fusion of multi-modality sensory data. Second, we generate insights of users’ activities that include facts of activity occurrence, temporal and spatial patterns, associations among multiple activities, etc. Finally, we build a prototype mobile app to visualize the insights according to selected themes. We discuss challenges in the process, and opportunities for research and applications in lifelogging information access.

### [MedWeb]

• Overview of the NTCIR-13: MedWeb Task
Shoko Wakamiya, Mizuki Morita, Yoshinobu Kano, Tomoko Ohkuma and Eiji Aramaki
[Pdf] [Table of Content]
Recently, the amount of medical or clinical related information on the web has been increasing. Among various types of information, one of the most valuable source is social media. NTCIR-13 MedWeb (Medical Natural Language Processing for Web Document) provides pseudo-Twitter messages in a cross language corpus, covering Japanese, English, and Chinese. The task is to classify them whether or not the message contains patient symptom. Since these sub task settings can be formalized as multi-label classification of disease/symptom-related texts, the achievements of this task can almost be directly applied to a fundamental engine for actual applications. 8 groups (19 systems) participated in the Japanese subtask. 4 groups (12 systems) participated in the English subtask. 2 groups (6 systems) participated in the Chinese subtask. This report is to present results of these groups with discussions that to clarify the issues that need to be resolved in the medical natural language processing fields.

• Keyword-based Challenges at the NTCIR-13 MedWeb
Mamoru Sakai and Hiroki Tanioka
[Pdf] [Table of Content]
The AITOK team participated in the MedWeb Japanese subtask of the NTCIR-13. This report describes our approaches to challenging the multi-label classification problem of disease/symptom-related texts and reports some improvements after the formal-run. There are three approaches. The first one is a Keyword-based approach, the second one is a Logistic Regression approach, and the third one is a Support Vector Machine(SVM) approach. Keyword-based and Logistic Regression approaches were submitted to the formal-run. SVM-based approach was not submitted to the formal-run. These challenges made some improvements and have realized a machine learning approach is really good compared to other approaches.

• AKBL at the NTCIR-13 MedWeb Task
Reine Asakawa and Tomoyoshi Akiba
[Pdf] [Table of Content]
The AKBL team participated in the Twitter subtask of the NTCIR-13 MedWeb Task. We tackled the task by using machine learning technique, the real tweets which were collected under specific conditions and Fisher’s exact test. This paper outlines the methods we used to obtain the result evaluated by the task organizer.

• NTCIR13 MedWeb Task: multi-label classification of tweets using an ensemble of neural networks
Hayate Iso, Camille Ruiz, Taichi Murayama, Katsuya Taguchi, Ryo Takeuchi, Hideya Yamamoto, Shoko Wakamiya and Eiji Aramaki
[Pdf] [Table of Content]
This paper describes how we tackled the Medical Natural Language Processing for Web Document (MedWeb) task as participants of NTCIR13. We utilized multi-language learning to integrate the multi-language inputs of the task into a single neural network. We then built two neural networks--a hierarchical attention network (HAN) and a deep character convolutional neural network (CharCNN)--with multi-language learning and combined both outputs to utilize the advantages of each neural network. This combination was carried out using ensembling, specifically the method of bagging. We found that the ensemble using the loss functions NLL and hinge produced the best results with 88.0% accuracy.

• Classification of Tweet Posters for Diseases by Combined Rule-Based and Machine Learning Method in NTCIR-13: MedWeb Twitter Task (Japanese Subtask)
Masahito Sakishita and Yoshinobu Kano
[Pdf] [Table of Content]
We propose methods which automatically classify Japanese tweet posters who may have diseases (positive, p) or not (negative, n). Our methods combine a rule-based method and a machine learning method. Our rule-based method is derived from our observation on the training data. Our machine learning method uses SVM (Support Vector Machine) with word level features. Our system achieved 0.802 of the p/n exact match rate and 0.871 of F1 score, better than the baseline system of NTCIR-13 MedWeb. In the F1 score, our best configuration is ranked third of all participating teams of this task. We found that individual methods and their combinations have different advantages. Other combinations which we have not tried could raise the accuracy rate in future.

• UE and Nikon at the NTCIR-13 MedWeb Task
Anh Hang Nga Tran, Hiroko Kobayashi, Yu Sawai and Paulo Quaresma
[Pdf] [Table of Content]
The NTCIR-13 MedWeb (Medical Natural Language Pro- cessing for Web Document) Task is a continued task series in NTCIR 10-11-12 in more specific symptoms include in- fluenza, diarrhea/stomachache, hay fever, cough/sore throat, headache, fever, runny nose, and cold written. It is to detect the “signs” of disease or symptoms from the pseudo-Twitter messages, and can be formalized as multi-label classifica- tion problem. We address this task from three different ap- proaches, namely rules, feature engineering, and distributed representations for Japanese and English tweets correspond- ingly. Among our approaches, the feature-engineering based approach achieved the highest exact-match (80.4%) and F1 score (86.5%). We figured out that each approach has its own strength and shortcomings through errors analysis.

• Principle Base Approach for Classifying Tweets with Flu-related Information in NTCIR-13 MedWeb Task
Josan Wei-San Lin, Hong-Jie Dai and Joni Yu-Hsuan Shao
[Pdf] [Table of Content]
Disease surveillance system on social media become more important and difficult recently, not only for the variety data type, but also the difficulty for extracting the disease related words and interpreting correctly. In real world, one disease can have many types of symptoms. A symptom is observed by the patient, which is subjective and cannot be measured directly. For example, people who have the flu often feel some of these symptoms: fever, cough, or runny nose. How to identify the disease according to observed symptoms is a challenging problem. In this study, we propose a principle base method to approach the goal of classifying tweets conveying flu-related information in Japanese. By evaluating the proposed method on the corpus of the NTCIR-13 MedWeb task, the proposed method achieves micro/macro F-scores of 0.8352/0.8290 on the training set, and an F-score of 0.835 on the test set. In the future, we will include grammatical information to improve the performance of the developed model.

• NIL: Using scoring to analyse the ambiguous messages on the NTCIR-13 MedWeb task
Masao Ito
[Pdf] [Table of Content]
To decide the symptom and disease from the short message, we first analyse the text by the lexical and syntactical analysis. Then, we give it the point according to the criteria that the guideline provides. Finally, we decide the positive or negative value for each symptom/disease by using the weighted sum.

• Medweb Task: Identify Multi-Symptoms from Tweets Based on Active Learning and Semantic Information
Chao Li, Kang Xin and Fuji Ren
[Pdf] [Table of Content]
Recently, the web have an ever increasing number of medical or clinical related information. Among all the data sources, social media is the most valuable. Ntcir-13 Medweb (medical Natural Language Processing for Web documents) releases the task which identifying the patient symptoms from text. This task exploits pseudo twitter messages as a cross language corpus covering Japanese, English, and Chinese. This paper focuses on the task in Chinese. To finish this task, Active Learning and Semantic Information are exploited in the experiments. Active Learning method is used to find out the new massage which is difficult to be discriminated. With the message from Weibo, the new message are labeled and added to the training data gradually. Word embedding is used as Semantic Information and used to complement the features for each massage. According to the experimental results, the proposal method outperforms other methods in terms of recall. And the overall performance approximates to the baseline. The results also show that the additional training data used in the experiments can only increase recall for this task and the semantic information based on word embedding can increase the overall performance.

• DrG at NTCIR­-13: MedWeb Task
Kazui Morita and Takagi Toshihisa
[Pdf] [Table of Content]
NTCIR13 MedWeb Task provides pseudo-Twitter messages (in Japanese, English, and Chinese) and is to classify these messages whether the message contains patient symptom or not. We participated in this Japanese subtask by using RandomForest and some manual rules.

### [OpenLiveQ]

• Overview of the NTCIR-13 OpenLiveQ Task
Makoto P. Kato, Takehiro Yamamoto, Tomohiro Manabe, Akiomi Nishida and Sumio Fujita
[Pdf] [Table of Content]
This is an overview of the NTCIR-13 OpenLiveQ task. This task aims to provide an open live test environment of Yahoo Japan Corporation's community question-answering service (Yahoo! Chiebukuro) for question retrieval systems. The task was simply defined as follows: given a query and a set of questions with their answers, return a ranked list of questions. Submitted runs were evaluated both offline and online. In the online evaluation, we employed {\it optimized multileaving}, a multileaving method that showed high efficiency over the other methods in our preliminary experiment. We describe the details of the task, data, and evaluation methods, and then report official results at NTCIR-13 OpenLiveQ.

• YJRS at the NTCIR-13 OpenLiveQ Task
Tomohiro Manabe, Akiomi Nishida and Sumio Fujita
[Pdf] [Table of Content]
We participated in the NTCIR-13 OpenLiveQ Task in view of improving the baseline method, which is a linear combination of 77 features with different weights. We added three settings of extended BM25F as additional features, replaced the target function of weight optimization and re-calculated feature weights using 5-fold cross validation. Our run achieved the best Q-measure score among 10 runs in the offline test. In the online test, our run achieved the second-best total credit among 10 runs with slight difference from the best one due to its robustness. We also checked win-loss counts of our run against other nine runs for each page view. According to the results, our run won all other runs in statistically significantly large number of page views.

• Erler at the NTCIR-13 OpenLiveQ Task
Ming Chen, Lin Li, Yueqing Sun and Jie Zhang
[Pdf] [Table of Content]
In this paper, we present our approach to address the OpenLiveQ task at NTCIR-13. The task is a question retrieval task and can be simply defined as follows: given a query and a set of questions with their answers, return a ranked list of questions. However, there is a gap'' between queried questions and candidate questions, which is called lexical chasm or word mismatch problem. In our model, we improve traditional Topic inference based Translation Language Model (T$^2$LM) by using the topic information of queries to bridge'' the gap. The translation model and the topic model are used to link different words. Experimental results show that our system reaches the competitive performance among the participants in OpenLiveQ task.

• SLOLQ at the NTCIR-13 OpenLiveQ Task
Ryo Kashimura and Tetsuya Sakai
[Pdf] [Table of Content]
The SLOLQ (Sakai Laboratory OpenLiveQ) team submitted six runs to the Offline Test of the NTCIR-13 OpenLiveQ Task, including a similarity ranking run and a diversity ranking run. Subsequently, our similarity ranking run was evaluated in the Online Test. Unfortunately, our offine results show that our Similarity Ranking and Diversity Ranking runs are statistically indistinguishable from those that rank questions at random. Our online results show that our Similarity Ranking run failed to outperform a baseline that simply ranks questions by the number of answers they received.

• TUA1 at the NTCIR-13 OpenLiveQ Task
Mengjia He, Xin Kang and Fuji Ren
[Pdf] [Table of Content]
Our group submitted the OpenLiveQ task of NTCIR-13. With th openliveq tool offered from the organizers, we compare a baseline result with three results ranked by RF (Random Forests). It shows the result ranked by RF in 1000 bags is closest to baseline, but still lower in average.

• OKSAT at NTCIR-13 OpenLiveQ Task
Takashi Sato
[Pdf] [Table of Content]
Our group OKSAT submitted 21 runs for the NTCIR-13 OpenLiveQ task. We submitted from simple to complicate runs. Complicate runs are combinations of simple ones in most cases. We searches the question data mainly. We searched title, snippet and body by the query string, and merged their scores. We also took account page view and number of answers

### [QALab-3]

• Overview of the NTCIR-13 QA Lab-3 Task
Hideyuki Shibuki, Kotaro Sakamoto, Madoka Ishioroshi, Yoshinobu Kano, Teruko Mitamura, Tatsunori Mori and Noriko Kando
[Pdf] [Table of Content]
The NTCIR-13 QA Lab-3 task aims at the real-world complex Question Answering (QA) technologies using Japanese university entrance exams and their English translation on the subject of World history''. QA Lab-3 has three end-to-end tasks for multiple-choice, term and essay questions. The essay task has three subtasks of extraction, summarization and evaluation-method. There were 85 submissions from 13 teams in total. We describe the used data, formal run results, and comparison between human marks and automatic evaluation scores for essay questions.

• HagiwaraLab at the NTCIR-13 QALab-3 Task
Yuanzhi Ke and Masafumi Hagiwara
[Pdf] [Table of Content]
The objective of the term question subtask of QALab-3 in NTCIR-13 is to answer some historical questions by several words, instead of choosing the right answers from several options like the QALab-3 multi-choice subtask. We used recurrent neural networks (RNNs) to extract the answer. However, we encountered memory issues when we tried to input the retrieved documents from Wikipedia into the neural network. To solve the memory issues, we input the automatically summarized summaries of the documents instead of the original ones. They were summarized by the relevance scores based on the tf-idf weighted word embeddings. In this paper, we introduce our system for the term question subtask. We discussed the effects of the summarization technology, the length of the summary and the issue of multi-document summarization. The system can be improved by more carefully specified knowledge base, a better algorithm for summarization or more powerful machines.

• KSU Team's QA System for World History Exams at the NTCIR-13 QA Lab-3 Task
Tasuku Kimura, Ryo Tagami and Hisashi Miyamori
[Pdf] [Table of Content]
This paper describes the systems and results of the team KSU for QA Lab-3 task in NTCIR-13. We have been developing question answering systems for the world history multiple-choice questions in the National Center Test for University Admissions. We newly developed automatic answering systems for the world history questions in the second-stage exams of Japanese entrance examinations consisting of the term questions and the essay questions. In the multiple-choice question subtask, we improved on automatic answering systems in QA Lab-2 by implementing query generation methods in accordance with the answer types. In the term question subtask, we designed systems that focus on the category prediction using word prediction models and the evaluation score based on the graph of dependency relations. In the essay question subtask, we proposed automatic answering methods that combines the document retrieval depending on the instructions of how the essay should be summarized, and the knowledge sources constructed from various simple sentences.

• SLQAL at the NTCIR-13 QA Lab-3 Task
Kou Sato and Tetsuya Sakai
[Pdf] [Table of Content]
The SLQAL team participated in Phase-1, Phase-2 and Phase- 3 of the NTCIR-13 QA Lab-3 Japanese subtask(National Center Tests(Multiple-Choice type questions)). This paper describes our approraches in Phase-3.

• ATA2: A Question Answering System at NTCIR-13 QALab-3
Tao-Hsing Chang, Yu-Sheng Tsai, Chih-Li Tsai and Pei-Xuan Cai
[Pdf] [Table of Content]
In 2016, we proposed a technique, called ASEE or ATA1, to automatically answer multiple-choice question items. Multiple-choice questions refer to items where the best option is to be selected from the options provided. The core concept behind ATA1 is the idea that only the correct answer can become valid information and that there will be more valid information appearing on the Wikipedia. However, although such statistical method as ATA1 is very effective in processing multiple-choice questions, it cannot be used where the answer is not one of the options or on other types of questions, such as term questions that require an inference to find the answer. Therefore, this paper proposes a new tool for automatic answering called the ATA2. This tool will convert the content and Wikipedia page of the item into concept maps. A concept map is used to express the architecture of the knowledge. ATA2 compares the similarity between the concept maps of the item and source of knowledge to determine the answer. ATA2 can be applied to both of multiple-choice and term questions. This paper also shows the accuracy of ATA2 at QA-Lab 3.

• Rubric-based Automated Japanese Short-answer Scoring and Support System Applied to QALab-3
Tsunenori Ishioka, Kohei Yamaguchi and Tsuneori Mine
[Pdf] [Table of Content]
We have been developing an automated Japanese short-answer scoring and support machine for new National Center written test exams. Our approach is based on the fact that accurate recognizing textual entailment and/or synonymy has been almost impossible for several years. The system generates automated scores on the basis of evaluation criteria or rubrics, and human raters revise them. The system determines semantic similarity between the model answers and the actual written answers as well as a certain degree of semantic identity and implication. An experimental prototype operates as a web system on a Linux computer. To evaluate the performance, we apply the method to the second round of tests given by Todai entrance examinations. We compared human scores with the automated scores for a case in which 20 allotment points were placed in 5 test issues of a world-history test as a trial examination. The differences between the scores were within 3 point for 16 of 20 data provided by NTCIR QALab-3 task office.

• IMTKU Question Answering System for World History Exams at NTCIR-13 QA Lab-3
Min-Yuh Day, Chao-Yu Chen, Wan-Chu Huang, I-Hsuan Huang, Shi-Ya Zheng, Tz-Rung Chen, Min-Chun Kuo, Yue-Da Lin and Yi-Jing Lin
[Pdf] [Table of Content]
This paper describes the IMTKU (Information Management at Tamkang University) question answering system for world history exams in Japanese university entrance exams at NTCIR-13 QA Lab-3. The IMTKU team proposed a question answering system that integrates natural language processing with deep learning approach for Japanese university entrance exams at NTCIR-13 QA Lab-3. In QA Lab-3 phase-2, the IMTKU team submitted 3 English End-to-End multiple-choice run results, 2 English End-to-End essay run results, 2 Japanese End-to-End essay run results, 2 English extraction essay run results, 2 Japanese extraction essay run results, 1 English summarization essay run result, and 1 Japanese summarization essay run result for National Center Tests and Second-stage Examinations. The best total score of IMTKU QA system is 40 in English multiple-choice subtask phase-3 and the best score is 0.408 for the complex Japanese essay subtask at NTCIR-13 QA Lab-3.

• DGLab Question Answering System and Automatic Evaluation Method at NTCIR-13 QA Lab-3 for University Entrance Exam on World History Essay
Mike Tian-Jian Jiang
[Pdf] [Table of Content]
(TBA) The paper describes DGLab question answering system and automatic evaluation method at NTCIR-13 QA Lab-3 for Japanese university entrance exam on world history essay. Submissions of subtasks include extraction, summarization, and evaluation method, in Phase-2 and Research Run for both Japanese and English. Particularly on summarization subtask, team DGLab achieved the 1st and the 2nd places, for Japanese and English, respectively, in terms of total scores graded by human experts.

• SML Question-Answering System for World History Essay Exams at NTCIR-13 QALab-3
Yusuke Doi, Takuma Takada, Takuya Matsuzaki and Satoshi Sato
[Pdf] [Table of Content]
This paper describes SML team's question-answering system for world history short essay-type question at NTCIR-13 QALab-3. Our system consists of an extraction module and an compression module. In the extraction module, we identify the theme and the focus of a question, and extract several sentences from a glossary of world history that are appropriate for the theme and the focus. In the compression module, we compared three compression methods based on manually-designed compression rules, statistics from a corpus, and a hybrid of the both.

• MTMT in QALab-3: World History Essay Question Answering System that Utilizes Textbooks and Open Knowledge Bases
Takaaki Matsumoto and Teruko Mitamura
[Pdf] [Table of Content]
This paper introduces the system and its evaluation for answering world history essay questions by utilizing linked open data which assists machine translation. Since the target questions are the world history subject of the entrance examination of the University of Tokyo, most answers can be found in the Japanese world history textbooks. However, an equivalent content of high-quality English translation of the Japanese world history textbooks is not available. Therefore, we try to translate those textbooks utilizing linked open data, and using source language knowledge resource of which content is not equivalent with the target knowledge resource. The evaluation result indicates that the proposed system shows the best ROUGE-1 scores of all the end-to-end submissions. The result of this paper concludes followings. 1) Simple neural translation of knowledge resource does not work for domain-specific cross-lingual question answering. 2) Linked open data is effective to find correct translation for difficult terms in machine translation process. 3) Adding source language open knowledge resource would help even if its content is not equivalent to the target knowledge resources.

• CMUQA in QALab-3: Essay Question Answering for University Entrance Exams
Fadi Botros, Francesco Ciannella, Takaaki Matsumoto, Evan Chan, Cheng-Ta Chung, Lucas Bengtson, Keyang Xu, Tian Tian and Teruko Mitamura
[Pdf] [Table of Content]
This paper describes the CMUQA entry into NTCIR QA Lab-3. CMUQA is an essay-question answering system that uses Wikipedia as its knowledge base. The essay-question portion of QA Lab-3 is composed of real exam questions from the University of Tokyo's entrance exams which are focused on world history. The proposed system formulates answers to these questions using the following sequence of processes: question analysis, document retrieval, sentence extraction, sentence scoring, text ordering and text summarization. Despite the difficulty of this challenge and the small size of training data, CMUQA achieved the highest expert score amongst the competing end-to-end systems in the task.

• Forst: Question Answering System for Second-stage Examinations at NTCIR-13 QA Lab-3 Task
Kotaro Sakamoto, Madoka Ishioroshi, Yuta Fukuhara, Akihiro Iizuka, Hideyuki Shibuki, Tatsunori Mori and Noriko Kando
[Pdf] [Table of Content]
We participated in all phases of the term question task and the essay question task in Japanese. We described changes since the QA Lab-2 and methods for the evaluation method subtask.

### [STC-2]

• Overview of the NTCIR-13 Short Text Conversation Task
Lifeng Shang, Tetsuya Sakai, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao, Yuki Arase and Masako Nomoto
[Pdf] [Table of Content]
We give an overview of the NII Testbeds and Community for Information access Research (NTCIR)-13 Short Text Conversation (STC) task, which was a core task of NTCIR-13. At NTCIR-12, STC was taken as an IR problem by maintaining a large repository of post-comment pairs then finding a clever method of reusing these existing comments to respond to new posts. At NTCIR-13, besides the retrieval-based method, we focused on a new method called generation-based method to generate new'' comments. The generation-based method has gained a great deal of attention in recent years, even though there the problem still remains of whether the retrieval-based method should be wholly replaced with or combined with the generation-based method for the STC task. By organizing this task at NTCIR-13, we provided a transparent platform to compare the two aforementioned methods by conducting comprehensive evaluations. For the Chinese subtask, there were a total of 34 registrations, and 22 teams finally submitted 120 runs. For the Japanese subtask, there were a total of 9 registrations, and 5 teams submitted 15 runs. In this paper, we review the task definition, evaluation measures, test collections, and evaluation results of all teams.

• TUA1 at NTCIR-13 Short Text Conversation 2 Task
Yunong Wu, Xin Kang, Kenji Kita and Fuji Ren
[Pdf] [Table of Content]
In this paper, we describe the overview of our work in Short Text Conversation 2 task at NTCIR-13. We propose two different methods including retrieval-based method and generation-based method. Our retrieval-based method contains index part and re-ranking part. Rep-post is used as query to search comments from rep-cmnt, and indexed candidate comments are re-ranked by three models respectively. Our generation-based method constructs a sequence-to-sequence neural network model with attention mechanism, to sequentially read a post sentence word by word, calculate an attention weight over the input words, and output a comment sentence with the normal search and the beam search strategies. We propose an RNN model to reorder the generated comment sentences from 26 parallel sequence-to-sequence models by evaluating the fitness between post-comment pairs, and employ a cosine similarity between the post-comment pair to assist the reordering. The evaluation over our groups of Formal Run submissions results suggest that our method is effective for re-ranking and generating a list of meaningful comment sentences for short text conversation.

• Nders at NTCIR-13 Short Text Conversation 2 Task
Han Ni, Liansheng Lin and Ge Xu
[Pdf] [Table of Content]
This paper describes our retrieval-based approaches at NTCIR- 13 short text conversation 2 (STC-2) task (Chinese). For a new post, our system firstly retrieves similar posts in the repository and gets their corresponding comments, and then finds the related comments directly from the repository. Moreover, we devise two new methods. 1) LSTMSen2Vec model to get the vector of sentence. 2) Pattern-IDF to rerank the candidates from above. Our best run achieves 0.4780 for mean nG@1, 0.5497 for mean P+, and 0.5882 for mean nERR@10, and respectively rankes 4th, 5th, 5th among 22 teams.

• YJTI at the NTCIR-13 STC Japanese Subtask
Toru Shimizu
[Pdf] [Table of Content]

• SLSTC at the NTCIR-13 STC Task
Jun Guan and Tetsuya Sakai
[Pdf] [Table of Content]
SLSTC participated in the Chinese Subtask of the NTCIR- 13 STC Task. We submitted one simple retrieval-based run, SLSTC-C-R1, which was generated by first retrieving a post from the Weibo repository and then selecting one comment for the retrieved post. Unfortunately, our run was not successful.

• Beihang at the NTCIR-13 STC-2 Task
Dejian Yang, Yu Wu, Zhoujun Li, Wei Wu and Can Xu
[Pdf] [Table of Content]
This paper presents our system participated in the NTCIR-13 Short Text Conversation Task. Previous researches focus on how to rerank response candidates but pay little attention on how to generate high quality candidates. In this work, we try different approaches for the retrieval part, such as neural network features and symbolic features. The evaluation results show that symbolic features can significantly boost the performance of the retrieval-based chatbot. According to the official STC evaluation, we are in the second best group among teams who participated in the Chinese retrieval-based chatbots.

• SRCB at the NTCIR-13 STC-2 Task
Yihan Li, Shanshan Jiang, Lei Ding, Yixuan Tong and Bin Dong
[Pdf] [Table of Content]
This is the first time SRCB participates in the Short Text Conversation (STC) task of Chinese. We developed conversation systems for both retrieval-based method and generation-based method. For retrieval based method, we proposed two models to retrieve post results from the repository based on the following three steps: preprocessing, candidate comment matching by indexing posts and comments in repository, and candidate comment ranking. For generation based method, we employed the state-of-the-art architecture Seq2Seq model to generate comments for posts. The evaluation results for both methods show that our proposed approaches achieve competitive results.

• RUCIR at the NTCIR-13 STC-2 Task
Yutao Zhu, Zhicheng Dou, Xiangbo Wang, Xiaochen Zuo, Shuqi Lu, Zhengyi Ma and Xinyi Zhang
[Pdf] [Table of Content]
This paper describes the RUCIR's systems which participated in the Short Text Conversation (STC-2) task (Chinese) at NTCIR-13. Contrary to the STC task in NTCIR-12, generation-based methods are considered and participants are encouraged to explore some effective ways to combine both retrieval-based and generation-based approaches to get a more intelligent chatbot. This paper introduces two systems we proposed: (1) a retrieval-based system (2) a generation-based system combined with retrieval-based methods. Besides, we do some analysis and discussion on the two systems based on the results.

• Response Generation for Grounding in Communication at NTCIR-13 STC Japanese Subtask
Hiroki Tanioka
[Pdf] [Table of Content]
The AITOK team participated in NTCIR-13 STC Japanese Subtask. This report describes our approach to generating responses to comment texts of Yahoo! News comments data, and discusses our results of formal-run. Our approach intends to make sure of grounding in communication, thereby integrates three strategies and five rules. The strategies are on the presupposition that there is not enough information regarding the first comment text in our auto-responder system. Then, the method of auto-responder consists of three steps, labeling, finding, and generating. Although the approach is very simple, the formal-run result was really good in Rule-1. However, the result was not enough in Rule-2 due to short of information in the responses.

• KSU Team’s Dialogue System at the NTCIR-13 Short Text Conversation Task 2
Yoichi Ishibashi, Sho Sugimoto and Hisashi Miyamori
[Pdf] [Table of Content]
In this paper, the methods and results by the team KSU for STC-2 task at NTCIR-13 are described. We implemented both retrieval-based methods and a generation-based method. In the retrieval-based methods, a comment text with high similarity with the given utterance text is obtained from Yahoo! News comments data, and the reply text to the comment text is returned as the response to the input. Two methods were implemented with different information used for retrieval. It was confirmed that the precision of response selection was improved by selectively using some information on news articles on which the dialogue was based. In the generation-based method, we propose the Associative Conversation Model that generates visual information from textual information and uses it for generating sentences in order to utilize visual information in a dialogue system without image input. In research on Neural Machine Translation, there are studies that generate translated sentences using both images and sentences, and these studies show that visual information improves translation performance. However, it is not possible to use sentence generation algorithms using images for the dialogue systems since many text-based dialogue systems only accept text input. Our approach generates (associates) visual information from input text and generates response text using context vector fusing associative visual information and sentence textual information. As a preliminary result, it was confirmed that visual information seemed to work effectively in several examples.

• KIT Dialogue System for NTCIR-13 STC Japanese Subtask
Hiroshi Nakatani, Shigenori Nishiumi, Takahiro Maeda and Masahiro Araki
[Pdf] [Table of Content]
We introduce three methods for solving the NTCIR-13 STC Japanese Subtask. Method_1 is a retrieval-based method of scoring reply texts using TF-IDF, with relevance filtering using word2vec. Method_2 is a generation-based method using a seq2seq model. Method_3 is a retrieval-based method based on unsupervised clustering of dialogue acts. During the evaluation, Method_1 achieved the best results.

• WUST System at NTCIR-13 Short Text Conversation Task
Maofu Liu, Yifan Guo, Yan Yu and Han Ren
[Pdf] [Table of Content]
Our WUST team has participated in the Chinese subtask of the NTCIR-13 STC (Short Text Conversation) Task. This paper describes our approach to the STC and discusses the official results of our system. Our system constructs the model to search the appropriate comments for the query derived from the given post. In our system, we hold the hypothesis that the relevant posts tend to have the common comments. Given the query q, we firstly adopt the framework to extract the topic words from q, and retrieve the initial set of post-comment pairs, and then the post-comment pairs are used to match and rank to produce the final ranked list. The core of the system is to calculate the similarity between the responses and the given query q. The experimental results using the NTCIREVAL tool suggest that our system should be improved by combining with knowledge and features.

• Gbot at the NTCIR-13 STC-2 Task
Hainan Zhang, Tonglei Guo, Yanyan Lan, Jiafeng Guo, Jianing Li and Xueqi Cheng
[Pdf] [Table of Content]

• Microsoft Research Asia at the NTCIR-13 STC-2 Task
Zhongxia Chen, Hongyan Huang, Dinglong Li, Ruihua Song and Xing Xie
[Pdf] [Table of Content]
This paper describes our approaches in NTCIR-13 on short text conversation(STC) task (Chinese). For retrieval-based method, we propose the response ranking model which takes not only the text information into account, but also considers visual features of images corresponding to the text. For generation-based method, we propose the emotion-aware neural response generation model. Based on the attentionbased sequence-to-sequence model, our model generates emotional responses by involving emotion information while decoding. Ocial results show that both emotion and image information improve the eectiveness of response retrieving or generating, and our best run gains 0.1822 for mean nDCG@1 , 0.3002 for mean P+ and 0.3241 for mean nERR@10.

• CIAL System at the NTCIR-13 STC-2 Task
Yung-Chun Chang, Yu-Lun Hsieh and Wenlian Hsu
[Pdf] [Table of Content]
Short text conversation (STC) has emerged as a prominent research topic and gained considerable attention in recent years. While it is still an open problem whether the retrieval-based method should be replaced by or combined with generative models for STC task, the NTCIR-13 STC-2 Task provides a transparent platform to compare the two aforementioned methods via doing comprehensive evaluations. In this task, we proposed a retrieval-based method with distributed vector representation, and a generation-based method with recurrent neural networks. Overall, we submitted 4 and 1 official runs for retrieval and generation settings, respectively. We also proposed a data augmentation method for extending the amount of labeled data that is more sufficient for training a generative model.

• SG01 at the NTCIR-13 STC-2 Task
Haizhou Zhao, Yi Du, Hangyu Li, Qiao Qian, Hao Zhou, Minlie Huang and Jingfang Xu
[Pdf] [Table of Content]
We describe how we build the system for NTCIR-13 Short Text Conversation (STC) Chinese subtask. In our system, we use the retrieval-based method and the generation-based method respectively. For the retrieval-based method, we develop several features to match the candidates and then apply a learning to rank algorithm to get properly ranked results. For the generation-based method, we first generate various high-quality comments and then do ranking to select better ones. As reported in the task overview, we have achieved top performance in both methods with three submissions for the retrieval-based method and five submissions for the generation-based method.

• CYUT-III Short Text Conversation System at NTCIR-13 STC-2 Task
Shih-Hung Wu, Wen-Feng Shih, Che-Cheng Yu, Liang-Pu Chen and Ping-Che Yang
[Pdf] [Table of Content]
In this paper, we report how we build the system for Chinese subtask in NTCIR13 Short Text Conversation (STC-2) shared task. We attend both retrieval-based and generation-based subtasks. Our retrieval-based system is implemented based on the Lucene search engine. The system is also used to enlarge the training data for our generation-based system, which is based on the sequence to sequence (seq2seq) model.

• Report for Japanese subtask for NTCIR-13 STC-2 from mnmlb
Sotaro Takeshita, Ryuji Tamaki, Yasuhiro Minami, Takeru Kazama and Masato Nakamura
[Pdf] [Table of Content]
This paper is the report of the Japanese subtask for NTCIR- 13 STC-2. We introduced Neural Network models (LSTM, ESIM and CNN) to rank the replies in the training database. In order to capture sequential information from given comments, we used RNNs (LSTM and ESIM). We compared these models with CNN. We also introduced an n-gram based statistic filter to the ranked replies. We applied our method to given subtask.

• UB at the NTCIR-13 STC-2 Task: Exploring Syntactic Similarities and Sentiments
Jianqiang Wang
[Pdf] [Table of Content]
The University at Buffalo (UB) team participated in the STC-2 Chinese task at the NTCIR-13, working on the retrieval-based subtask. We investigated the use of manually crafted rules for improving resulted returned by an Okapi BM25 IR system. Comments that are too syntactically similar to the query post are first excluded from the result set. We then raised the ranks of those comments that contain positively sentimental/opinionated words if the test post also contains any positively sentimental/opinionated word. We also tried a method of first retrieving posts from the collection for a new post and then extracting comments that corresponded to these posts. Finally, we tested the effectiveness of combining the ranked lists from these runs. The official evaluation results show that while our baseline IR approach is effective, the usefulness of other techniques that we tried is limited. Future research directions are discussed.

• WIDM at NTCIR-13 STC-2
Yu-Han Chen, Sébastien Montella, Wei-Han Chen and Chia-Hui Chang
[Pdf] [Table of Content]
In this paper, we describe our contribution for the NTCIR-13 Short Text Conversation (STC) Chinese task. Short text conversation remains an important part on social media gathering much attention recently. The task aims to retrieve or generate a relevant comment given a post. We consider both closed and open domain STC for retrieval-based and generation-based track, respectively. To be more specific, the former applies a retrieval-based approach from the given corpus, while the later utilizes the Web to fulfill the generation-based track. Evaluation results show that our retrieval-based approach performs better than the generation-based one.

• splab at the NTCIR-13 STC-2 Task
Xuan Liu, Xueyang Wu, Ruinian Chen, Zijian Zhao, Hongtao Lin and Kai Yu
[Pdf] [Table of Content]
The splab team participated in the Chinese subtask of the NTCIR-13 on Short Text Conversation(STC) Task. A large amount of pairs of post-comments are provided as data repository or training set. Given a new post, we need to retrieve suitable responses from the existing comments(retrieval-based method) or generate appropriate replies with the model trained on the data(generation-based method). We adopt the generation-based method and develop several systems based on the encoder-decoder framework. In our systems, we first implement the basic encoder-decoder architecture enhanced by attention mechanism. However this model tends to generate short and dull responses. And to alleviate this problem, we utilize two ways to enrich the information contained in the generated responses. One way is multiresolution recurrent neural network and the other is sequence generation based on word-level external memory. Both methods focus on some keywords in the post so as to generate targeted and informative responses. And we also use a DSSM to rerank the candidates to select the most reasonable responses. The evaluation of submitted results do not correspond with what we have expected and we believe that it reflects the weakness of human subjective evaluation.

• DeepIntell at the NTCIR-13 STC-2 Task
Chunyue Zhang, Dongdong Li and Huaxing Shi
[Pdf] [Table of Content]
This paper provides an overview of DeepIntell's system participated in the Short Text Conversation (STC2) task of Chinese at NTCIR-13. Previous STC of NTCIR-12 is a conversation task which can be defned as an IR problem, i.e.,retrieval based a repository of post-comment pairs. STC2 of NTCIR-13 provided a transparent platform to compare the generation-based method and IR method via comprehensive evaluations. In this paper, we adopt the IR method and propose a SVM based ranking method with the deep matching features to score the post-comment pairs. Our framework is based on the following four steps: 1) preprocessing, 2) building search index, 3) comment candidates generation, 4) comment candidates ranking. Our best run achieves 0.5564 for mean P+,0.4323 for mean nDCG@1 and 0.5594 for mean nERR@10. The evaluation of submitted results empirically shows our framework is eﬀective in all these terms.

• BUPTTeam at the NTCIR-13 STC-2 Task
Songbo Han, Hao Zhu and Yongmei Tan
[Pdf] [Table of Content]
This paper provides an overview of BUPTTeam’s system participated in the Short Text Conversation (STC) task of Chinese at NTICR-13. STC is a new NTCIR challenging task which is defined as an information retrieval (IR) or natural language generation problem. In this paper, we propose a novel method to generate appropriate comments based on the following four steps: 1) preprocessing, 2) model building, 3) candidate comments generation, 4) candidate comments ranking. The evaluation results show that our methods finish the task successfully and have positive effect on improving the evaluation measurement.

• CKIP at the NTCIR-13 STC-2 Task
Wei-Yun Ma, Chien-Hui Tseng and Yu-Sheng Li
[Pdf] [Table of Content]
In recent years, LSTM-based sequence-to-sequence model have been applied successfully in many fields, including short text conversation and machine translation. The inputs and outputs of the models are usually word sequences. However, for a fix-size training corpus, a word sequence or even part of it unlikely repeated many times, thus in natural, data sparseness problem could be an obstacle for training of sequence-to-sequence model. To address this issue, through this task, we propose the idea of using LSTM with concept sequence. That is, given input word sequence, we first predict the concept for each word of the word sequence and thus form a concept sequence as the input of the LSTM model. At training phase, the output remains the form of word sequence. So during testing phrase, given a generated concept sequence, LSTM model is able to directly output the corresponding response in a form of word sequence. Although our results are not among top systems in this task, the experimental results still show the potential of this idea through the comparison among our submitted runs.

• Contextual and Feature-based Models by PolyU Team at the NTCIR-13 STC-2 Task
Yanran Li, Hui Su and Wenjie Li
[Pdf] [Table of Content]
The PolyU team participated in the Chinese Short Text Conversation (STC-2) subtask of the NTCIR-13, the core task of NTCIR-13. At NTCIR-13, generation-based approaches and their evaluations are firstly introduced into the task. This minority report describes our methods to solving the STC problem including four retrieval-based and two generation-based typical approaches. We compare and discuss the official results.

• iNLP at the NTCIR-13 STC-2 Task
Long Qiu, Jingang Wang, Sheng Li, Junfeng Tian and Jun Lang
[Pdf] [Table of Content]
The iNLP team participated in the Short Text Conversation (STC) task of NTCIR-13. This report describes our attempt to solve the STC problem and discusses the official results.

### [AKG]

• Overview of NTCIR-13 Actionable Knowledge Graph (AKG) Task
Roi Blanco, Hideo Joho, Adam Jatowt, Hai-Tao Yu and Shuhei Yamamoto
[Pdf] [Table of Content]
This paper overviews NTCIR-13 Actionable Knowledge Graph (AKG) task. The task focuses on nding possible actions related to input entities and the relevant properties of such actions. AKG is composed of two subtasks: Action Mining (AM) and Actionable Knowledge Graph Generation (AKGG). Both subtasks are focused on English language. 9 runs have been submitted by 4 teams for the task. In this paper we describe both the subtasks, datasets, evaluation methods and the results of meta analyses.

• TUA1 at the NTCIR-13 Actionable Knowledge Graph Task: Sampling Related Actions from Online Searching
Xin Kang, Yunong Wu and Fuji Ren
[Pdf] [Table of Content]
This paper details our partition in the Action Mining (AM) subtask of NTCIR-13 Actionable Knowledge Graph (AKG) Task. Our work focuses on sequentially sampling the most related actions for any named entity based on online search results. We propose three criteria, i.e. significance, representativeness, and diverseness, for evaluating the relatedness of candidate actions in the search results. We analyze the quality of sampled actions from different online search strategies. The experiment results suggest that our method is effective for generating a sequence of related actions for named entities.

• CUIS Team for NTCIR-13 AKG Task
Xinshi Lin, Wai Lam and Shubham Sharma
[Pdf] [Table of Content]
This paper describes our approach for Actionable Knowledge Graph (AKG) task at NTCIR-13. Our ranking system scores each candidate property by combining semantic relevance to action and its document relevance in related entity text descriptions via a Dirichlet smoothing based language model. We employ supervised learning technique to improve performance by minimizing a simple position-sensitive loss function on our additional manually annotated training data from the dry run topics. Our best submission achieves NDCG@10 of 0.5753 and NDCG@20 of 0.7358 in the Actionable Knowledge Graph Generation (AKGG) subtask.

• TLAB at the NTCIR-13 AKG Task
Md Mostafizur Rahman and Atsuhiro Takasu
[Pdf] [Table of Content]
In recent years, popular search engines are utilizing the power of Knowledge Graph(KG) to provide specic answers to queries and questions in a direct way. It is expected that search engine result pages (SERPs) will provide facts about the quires satisfying semantic meaning, which encouraging researchers to constructing more powerful Knowledge Graph. One of the major challenges is disambiguating and recognizing entities and their actions stored in KG in a con- text. To achieve and advance the technologies related to actionable knowledge graph presentation, ActionMining (AM) is an essential step and relatively new research direction to nurture research on generating such KG that is optimized for facilitating entity's actions e.g. for entity \Donald J. Trump" most potential actions could be \won the US Presidential Election" or \targeting US journalists". This paper presents the Action Mining (AM) task organized by NTCIR- 13. We employ a probabilistic model to address the AM problem.

### [ECA]

• Overview of NTCIR-13 ECA Task
Qinghong Gao, Hu Jiannan, Xu Ruifeng, Gui Lin, Yulan He, Kam-Fai Wong and Qin Lu
[Pdf] [Table of Content]

• WUST CRF-Based System at NTCIR-13 ECA Task
Maofu Liu, Linxu Xia, Zhenlian Zhang and Yang Fu
[Pdf] [Table of Content]
This paper describes our work on Emotion Cause Analysis (ECA) task in NTCIR-13. This task aims to detect the cause description when an emotion happened. In this paper, we apply the Conditional Random Field (CRF) classification model to identifying the cause description with a series of features, such as POS features, basic word features, distance features, contextual features. The system includes three parts, i.e. preprocessing, feature extraction, and CRF classifier. Experimental results demonstrate that the CRF model is superior to other classification models in the emotion cause description detection.

• Decision Tree Method for the NTCIR-13 ECA Task
Xiangju Li, Shi Feng, Daling Wang and Yifei Zhang
[Pdf] [Table of Content]
This paper details our participation in the Emotion Cause Analysis (ECA), which is a subtask of the NTCIR-13 task. This task aims to identify the reasons behind a certain emotion expressed in text. It is a more difficult task compared with emotion analysis. We consider the task as a slight variation of supervised machine learning classification problems. Inspired by rule-based systems for emotion cause detection, some beneficial attributes are obtained which can serve for training models. Furthermore, this paper adopts the C4.5 method which has been widely used in data mining and machine learning for comprehensible knowledge representation. The effectiveness of our method is evaluated using the official dataset.

• The GDUFS System in NTCIR-13 ECA Task
Han Ren, Yafeng Ren and Jing Wan
[Pdf] [Table of Content]
Our system participates the evaluation task of emotion cause detection, which is a subtask of emotion cause analysis evaluation in NTCIR-13. The aim of the subtask is to find the clause that contain emotion cause, which is treated as a sequence labeling problem in our system. We employ a structural SVM tool, and build four types of features, including lexical information, distance, contexts and linguistic rule features, to build the sequence labeling model. Official results show that the system achieves an averaged performance in all participating systems.

### [NAILS]

• Overview of NTCIR-13 NAILS Task
Graham Healy, Tomas Ward, Cathal Gurrin and Alan Smeaton
[Pdf] [Table of Content]
In this work we emphasize the need for and we describe the first-of-its-kind RSVP (Rapid Serial Visual Presentation) - EEG (Electroencephalography) dataset released as part of the NTCIR-13 NAILS (Neurally Augmented Image Labelling Strategies) task at the NTCIR-13 participation conference. The dataset was used to support a collaborative evaluation task in which participating researchers benchmark machine-learning strategies against each other. The experimental protocol used to capture the dataset was designed to encompass a broad range of image search activities and coincident neural signals. Here, we outline the experimental protocol used to capture the dataset, discuss the motivation behind its construction and describe the results of the NAILS task at NTCIR-13.

• Ensemble Methods for the NTCIR-13 NAILS Task
Holly Hutson, Shlomo Geva and Philipp Cimiano
[Pdf] [Table of Content]
The QUT team participated in the NTCIR-13 Neurally Augmented Image Labeling Strategies (NAILS) task, this report describes our approach to solving the problem of developing machine learning models for classifying EEG data from an RSVP image search task. We explore the use of commonly used successful methodologies from the P300 Speller Paradigm, in particular the use of ensembles of support vector machines, and evaluate whether these methods still apply to the potentially more complex image search task.

• Deep Learning Approaches for P300 Classification in Image Triage: Applications to the NAILS Task
Amelia Solon, Stephen Gordon, Brent Lance and Vernon Lawhern
[Pdf] [Table of Content]
This paper describes the rationale behind, and the results of, five evaluation submissions to the NAILS (Neurally Augmented Image Labelling Strategies) challenge at the NTCIR-13 conference. Image triage is a time and resource intensive process for human labelers. Researchers have identified a potential P300-based BCI solution to alleviate the strain of manual labeling. The NAILS dataset was designed to capture the P300 signal over various image search activities and to act as a benchmark dataset for P300 detection methods. Here we describe approaches that utilize cross- and within-subject training using our in-house Convolutional Neural Network (CNN) EEGNet, and another state-of-the art event-related-potential approach which uses xDAWN spatial filtering with Information Geometry using Riemannian manifolds. We show improved performance with within-subject training, more data, and modifications to the EEGNet model, and briefly discuss the implications of using certain training data over others.

### [WWW]

• Overview of the NTCIR-13 We Want Web Task
Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong and Jingfang Xu
[Pdf] [Table of Content]
In this paper, we provide an overview of the NTCIR We Want Web (WWW) task, which comprises the Chinese and the English subtasks. The WWW task is a classical ad-hoc textual retrieval task. This round of WWW received 19 runs from 4 teams for the Chinese subtask, and 13 runs from 3 teams for the English subtask. In this overview paper, we describe the task details, data and evaluation methods, as well as the report on the official results.

• RMIT at the NTCIR-13 We Want Web Task
Luke Gallagher, Joel Mackenzie, Rodger Benham, Ruey-Cheng Chen, Falk Scholer and J. Shane Culpepper
[Pdf] [Table of Content]
Furthering the state-of-the-art in adhoc web search is one of the underlying goals for the NTCIR-13 We Want Web (WWW) task. Adhoc search can be viewed as a bridge connecting many of the specialized sub-fields that are a result of the way people connect to and use information access systems. Since this is the first year of the track, and no training data was provided for the English subtask, we focused on classic effectiveness improving techniques other than supervised learning, such as Markov Random Field Models (MRFs), static document features, field-based weighting, and query expansion. This round we made extensive use of the Indri search system and the flexible query language it provides to produce effective results.

• RUCIR at NTCIR-13 WWW Task
Ming Yue and Zhicheng Dou
[Pdf] [Table of Content]
In this paper, we present our approach in the We Want Web(WWW) task of NTCIR-13, for both English and Chinese languages. We implement a ranking model for traditional re-ranking problems based on learning to rank. We first process the raw data and extract various features for each query-document pair. Then we use LamdaMART to train the ranking model and get the ranking score for each unlabeled sample. Finally, we could get the document ranking list.

• THUIR at NTCIR-13 WWW Task
Yukun Zheng, Cheng Luo, Weixuan Wu, Jia Chen, Yiqun Liu, Huanbo Luan, Min Zhang and Shaoping Ma
[Pdf] [Table of Content]
This paper describes our approaches and results in NTCIR-13 WWW task. In English subtask, we adopt several advanced deep models, like DSSM and DRMM. In Chinese subtask, we additionally make a few changes in models to ensure them work well in the Chinese context and train the Duet model with the weak-supervised relevance labels generated by various click models. Meanwhile, We extract 3 types of features from data corpus to train a learning to rank model.

• An Evaluation of the Kernel Based Neural Ranking Model in NTCIR-13 WWW
Zhuyun Dai, Chenyan Xiong and Jamie Callan
[Pdf] [Table of Content]
This paper describes CMUIR's participation in the NTCIR-13 WWW task. In the context of the Chinese subtask, we experimented with a neural network approach using the kernel based neural ranking model (K-NRM). The model learns a word embedding that encodes IR-customized soft match patterns from a Chinese search log. The learned model is then directly applied to re-rank the baseline run result lists of the Chinese subtask. We extend K-NRM to incorporate multiple document fields for richer text presentation. We also experimented with different re-ranking cutoffs to reduce the influence of the gap between training and testing domains. Evaluation results confirmed the effectiveness of K-NRM.

• SLWWW at the NTCIR-13 WWW Task
Peng Xiao, Lingtao Li, Yimeng Fan and Tetsuya Sakai
[Pdf] [Table of Content]
SLWWW participated in the Chinese Subtask of the NTCIR- 13 WWW Task. We applied the query expansion methods based on word embeddings proposed by Kuzi, Shtok, and Kurland. However, according to our comparison with the baseline run, our runs were not successful. As the base- line run provided by the organisers was not included in the pools for constructing relevance assessments, we discuss condensed-list versions of the official evaluation measures in addition to the regular measures.