The 17th NTCIR Conference
Evaluation of Information Access Technologies
December 12-15, 2023
National Institute of Informatics, Tokyo, Japan

    [Preface]


  • Makoto P. Kato, Noriko Kando, Charles L. A. Clarke and Yiqun Liu
    [Pdf] [Table of Content]
  • Return to Top


    [Overview]


  • Takehiro Yamamoto and Zhicheng Dou
    [Pdf] [Table of Content]
    This is an overview of NTCIR-17, the seventeenth sesquiannual research project for evaluating information access technologies. NTCIR-17 involved various evaluation tasks related to information retrieval, natural language processing, question answering, etc. 9 tasks were organized in NTCIR-17. This paper describes an outline of NTCIR-17, which includes its organization, schedule, scope, and task designs. In addition, we introduce brief statistics of the NTCIR-17 participants. Readers should refer to individual task overview papers for their detailed descriptions and findings.
  • Return to Top


    [Keynote]


  • Paul Thomas
    [Pdf] [Table of Content]
    Relevance labels – annotations that say whether a result is relevant to a given search – are key to evaluating the quality of a search engine. Standard practice to date has been to ask in-house or crowd workers to label results, but recently-developed language models are able to produce labels at greatly reduced cost. At Bing we have been using GPT-4, with human oversight, for relevance labelling at web scale. We find that models produce better labels than third-party or even in-house workers, for a fraction of the cost, and these labels let us train notably better rankers. In this talk I'll report on our experiences with GPT-4, including experiments with in-house data and with TREC-Robust. We see accuracy as good as human labellers, and similar capability to pick "interesting" cases, as well as variation due to details of prompt wording. High accuracy makes it hard to improve, and I'll also discuss our work on high-quality "gold" labels and on metrics for the labels themselves.
  • Fernando Diaz
    [Pdf] [Table of Content]
    In this perspective presentation, I will review two core methods of improving the evaluation of information retrieval systems. Innovation in depth refers to deepening our empirical and theoretical understanding of effectiveness within a class of information retrieval problems. Innovation in breadth refers to understanding effectiveness in novel or underrepresented information access patterns. I will describe how different styles of evaluation research fit into these classes, connecting recent work in fairness and preference-based evaluation to a longer tradition of information retrieval research. I will close by reflecting on the value of seeing evaluation research as brittle and precarious.
  • Return to Top


    [Tutorial]


  • Tetsuya Sakai
    [Pdf] [Table of Content]
    The NTCIRPOOL toolkit creates pool files. The NTCIREVAL toolkit computes evaluation measures based on relevance, diversity, group-fairness, and so on. The Discpower toolkit computes p-values. While demonstrating how I actualy use these toolkits, I will provide some small but practically important tips for IR experimenters. The recommendations are based on a few decades of my experience as an "NTCIR researcher." This tutorial complements my NTCIR-16 tutorial (see http://sakailab.com/publications/ for slides and videos).
  • Return to Top


    [Panel]


  • Tetsuya Sakai, Haruka Maeda, Paul Thomas and Mark Sanderson
    [Pdf] [Table of Content]
    In this panel, we will discuss topics related to responsible information access (IA) - especially evaluating responsible IA. First, our panelists Haruka Maeda (Kyoto University), Paul Thomas (Microsoft), and Mark Sanderson (RMIT University) will each give a brief position talk. Next, we will have the panelists and audience discuss (a) Evaluating LLM-based IA responses; and (b) Evaluating societal impacts in the LLM-based IA era. Some relevant keywords: handling hallucinations and harmful responses, fairness, inequity, worker exploitation, environmental impact, bad research practices.
  • Return to Top


    [Pre-NTCIR Seminar]


  • Mark Sanderson
    [Pdf] [Table of Content]
    This year marks the 30th anniversary of web search engines and the 25th anniversary of Google being launched as a company. I will use this moment to both reflect on aspects of the information retrieval landscape and also look forward at ideas that I think our community has not examined in sufficient detail. Looking back, while it's clear that Web search has precipitated a revolution in information access, it is worth asking, what did web search revolt against? I will reflect on the way the information access was facilitated in the past. Next, I will examine what I think are some of the issues in information seeking going forward with a particular focus on the gaps in understanding. Here I will talk about some of the new research that we've been conducting at our RMIT university with a particular focus on the importance of conversational search and query variability.
  • Return to Top


    [Invited Talks]


  • Ian Soboroff
    [Pdf] [Table of Content]
  • Nicola Ferro
    [Pdf] [Table of Content]
  • Gareth J. F. Jones
    [Pdf] [Table of Content]
  • Return to Top



    Core Tasks


    [FinArg-1]


  • Chung-Chi Chen, Chin-Yi Lin, Chr-Jr Chiu, Hen-Hsen Huang, Alaa Alhamzeh, Yu-Lieh Huang, Hiroya Takamura and Hsin-Hsi Chen
    [Pdf] [Table of Content]
    This paper provides an overview of FinArg-1 shared tasks in NTCIR-17. We propose six subtasks with three different resources, including company manager presentations, professional analyst reports, and social media posts. 19 research teams registered for FinArg-1, and 11 teams submitted their system output for official evaluation.
  • Shaopeng Tang and Lin Li
    [Pdf] [Table of Content]
    Although argument mining has been discussed for several years, financial argument mining is still in the early stage. The IDEA team participates in Argument Unit Classification (for Earnings Conference Call) and Argument Relation Classification (for Earnings Conference Call) subtasks of the NTCIR-17 FinArg-1 Task. This paper presents our work on the two subtasks. For Argument Unit Classification subtask, we successively construct the models based on BERT and Roberta to classify a given argumentative sentence. To better extract the semantic features, we combine the pre-trained model with CNN.Micro-F1 and Macro-F1 achieve 76.47% and 76.46% in official evaluation results of the first run (i.e., IDEA-1), respectively, outperforming most approaches of other teams. For Argument Relation Classification subtask, we classify sentence pairs based on the pre-trained model and Prompt-Tuning. And Micro-F1 and Macro-F1 achieve 81.74% and 51.85% in official evaluation results of the third run (i.e., IDEA-3), respectively.
  • Mengjie Wu, Maofu Liu and Tian Zhang
    [Pdf] [Table of Content]
    his article introduces how we deal with the FinArg-1 task of NTCIR17. In the FinArg-1 task, we have completed three subtasks which are argument classification, argument relation identification, and identifying relations in the social media dataset. In the experiments, we use the Bert model for the FinArg-1 three subtasks module.
  • Swagata Chakraborty, Anubhav Sarkar, Dhairya Suman, Sohom Ghosh and Sudip Kumar Naskar
    [Pdf] [Table of Content]
    Comprehending arguments from financial texts helps investors in making data driven decisions. The FinArg tasks of NTCIR-17 deal with mining arguments related to finance from Research Reports, Earnings Conference Calls, and Social Media. In this paper, we describe our team's approach to solve the three such problems - Argument Unit Classification, Argument Relation Detection & Classification, and Identifying Attack and Support Argumentative Relations. We obtained best performance using pre-trained language models (like BERT-SEC and FinBERT) and cross-encoder architecture.
  • Heng-Yu Lin, Eugene Sy, Tzu-Cheng Peng, Shih-Hsuan Huang and Yung-Chun Chang
    [Pdf] [Table of Content]
    The TMUNLP team participated in the FinArg-1 Task of NTCIR- 17, focusing on Argument Unit Identification and Argument Relation Identification in the finance domain using social media and earnings call datasets. Notably, the team ranked 1st and 3rd in these subtasks, respectively. This paper presents the team's methodologies, results, and conclusions. For Earning Conference Call (ECC) Argument Unit Identification, an ensemble strategy combining diverse pre-trained models achieved a Macro F1 score of 0.766231, with significant contributions from models like ELECTRA, RoBERTa, BERT-base-uncased, and FinBERT. In ECC Argument Relation Identification, a combination of pretrained models and sampling strategies, along with voting mechanisms, improved natural language inference tasks. Future research opportunities include optimizing integration methods for semantic inference efficiency. Finally, in Social Media (SM) Argument Relation Identification, ChatGPT's keyword features positively impacted model performance. Challenges of translation and data imbalance were addressed through category-weighted sampling methods and soft voting, showcasing adaptable strategies. This study highlights the efficacy of ensemble strategies and diverse models in NLP tasks and emphasizes potential advancements in the field.
  • Daichi Yamane, Fei Ding and Xin Kang
    [Pdf] [Table of Content]
    This paper reports on the results produced by the TUA1 team in the Earnings Conference Call (ECC), Task 1 of Finarg-1 of NTCIR-17. The ECC is divided into two subtasks. One is Argument Unit Identification (AUI) and the other is Argument Relation Identification (ARI). There are two proposed methods. The first is to tune a pre-trained model based on the transformer architecture using prompts. This method was applied to both Argument Unit Identification and Argument Relation Identification. The second approach employs Cost-Sensitive Learning on pre-trained models, which were previously tuned. This was exclusively used for Argument Relation Identification. In the provided training and validation data for Argument Relation Identification, the correct labels were markedly unbalanced, with some specific labels being notably scarce. Cost-Sensitive Learning proves effective for such unbalanced datasets, often yielding higher results than pure pre-trained models alone. In our experiments involving prompt tuning, we leveraged the Weighted Random Sampler technique to further enhance accuracy on the unbalanced data. Experiments using the aforementioned methods revealed that we achieved the best results for Argument Relation Identification, and secured third place for Argument Unit Identification.
  • Chia-Tung Tsai, Wen-Hsuan Liao, Hsiao-Chuan Liu, Vidhya Nataraj, Tzu-Yu Liu, Mike Tian-Jian Jiang and Min-Yuh Day
    [Pdf] [Table of Content]
    In recent years, there has been a surge of interest in argument-based sentiment analysis and the identification of argumentative relationships in social media. These tasks encompass sentiment analysis of premises and claims, as well as the classification of argumentative relationships. Within these tasks, we have developed a fine-tuning method for transformer models. To evaluate and showcase this concept, we established a comprehensive framework to test and display the performance of BERT, RoBERTa, FinBERT, ALBERT, and GPT 3.5-turbo models on financial data and social media texts. Ultimately, the experimental results of these sub-tasks validate the effectiveness of our strategies. The primary contribution of our research is our proposal of two key elements: fine-tuning predominantly with BERT models and employing GPT for generative classification, aiming to enhance the identification of argumentative classifications. Through fine-tuning techniques, the state-of-the-art models can achieve better performance than the baseline.
  • Zih An Lin, Hsiao Min Li, Adam Lin, Yun Ching Kao, Chia Shen Hsu and Yao Chung Fan
    [Pdf] [Table of Content]
    In the exploration of the task "Identifying Attack and Support Argumentative Relations in Social Media Discussion Threads" (SMDT) [1], we aim to discern differences between proponents and adversaries in financial discussions on the internet. For classification tasks such as these, fine-tuning Transformer models like BERT [2] is an intuitive approach. In this study, we build upon this foundation by incorporating the Masked Language Model technique to enrich the model’s domain knowledge within the financial field. Furthermore, we optimize the model’s performance by adjusting the weights in the loss function. Experimental results confirm that both methods effectively enhance the model’s performance. This research introduces three simple yet effective methods to improve the Transformer model’s ability for SMDT. The code and model for this study are available at https://github.com/leonardo-lin/NTCIR.
  • Shih-Hung Wu and Tsung Hsun Tsai
    [Pdf] [Table of Content]
    This paper reports our prompt engineering approach to the FinArg-1 task. In year 2023, we focus on task 2. Our system adopts the GPT3.5 generation model to evaluate the argumentative relations in social media discussion threads. We used three different prompts guide the GPT3.5 model to evaluate the degree of support or attack, we refer it as a quantitative approach. Our system then collected the score to make the final decision. The official results shows promising direction of using quantitative prompt engineering on argumentative relation identification.
  • Ya-Mien Cheng and Jheng-Long Wu
    [Pdf] [Table of Content]
    This paper will provide a comprehensive overview of our participation in two shared tasks of NTCIR17 FinArg-1: Argument-based Sentiment Analysis (Earning Calls), including Argument Unit Classification (ECCAC) and Argument Relation Detection and Classification (ECCAR). We submitted three rounds of predictions for each task during the final evaluation. The study was to determine whether a sentence acted as a premise and predict its sentiment as none, support, or attack. Our proposed solution involves gathering insights through conversations with large language models. In both tasks, we categorized questions into two distinct types: those directly addressing classification categories and those about assessing financial market relevance. These questions were approached from three perspectives: the general public, professional financial market investors, or without specifying a particular view. These insights are then integrated into the features for model prediction. The experimentation mainly consisted of two scenarios: using only the original data and employing both the original data and ChatGPT’s answers during training. Ultimately, we discovered that incorporating ChatGPT’s responses alongside the original data yielded the highest scores in both tasks, surpassing other experimental configurations that relied solely on either the original data or ChatGPT alone. In the task of ECCAC, a 71.07% Macro-F1 was attained, while ECCAR yielded a score of 54.60% in Macro-F1. Although the performance in the ECCAC task did not significantly surpass other versions, it remained the most successful among the submitted versions.
  • Han-Chiang Kao, Hsin-Yun Hsu and Jheng-Long Wu
    [Pdf] [Table of Content]
    While argument mining has significantly advanced across various domains, its application to financial discussions remains relatively unexplored. Our motivation for this research is rooted in the understanding that sentiment analysis alone may be inadequate when evaluating financial discussions, as the financial world is influenced by many factors intricately intertwined with the sentiments and opinions expressed by investors, analysts, and policymakers. To enhance the analysis of financial arguments, we incorporate GPT into the field of financial argument mining and design custom prompts. This unique integration allows us to generate labels and summaries for the arguments extracted from social media discussions. Our research results indicate that adding the generated labels in the regular mode achieved the highest validation set Marco-F1 score (66.39%). These findings contribute to a deeper understanding of argument mining in financial and social media discussions.
  • Supawich Jiarakul, Hiroaki Yamada and Takenobu Tokunaga
    [Pdf] [Table of Content]
    This paper reports MONETECH's participation in FinArg-1's Argument Unit Identification in Earnings Conference Call subtask. Our experiments are based on the BERT and FinBERT models with additional experimentation on Large Language Model-based data augmentation, data filtering, and the model's layer freezing. Our best-performing submission, which is based on data filtering and the model's layer freezing, scores 75.54\% in micro F1 evaluation. Results from additional runs also show that the model's layer freezing and data filtering could further improve model performance beyond our best submission.
  • Return to Top


    [Lifelog-5]


  • Liting Zhou, Graham Healy, Cathal Gurrin, Ly Duyen Tran, Naushad Alam, Hideo Joho, Longyue Wang, Tianbo Ji, Chenyang Lyu and Duc-Tien Dang-Nguyen
    [Pdf] [Table of Content]
    NTCIR-17 witnessed the fifth iteration of the Lifelog task, which was designed to facilitate the comparative evaluation of various approaches for automatic and interactive information retrieval from multimodal lifelog archives. Within this paper, we elucidate the utilization of the test collection, delineate the specified tasks, provide an overview of the submissions, and present the findings derived from the NTCIR17 Lifelog-5 LSAT sub-task. Our conclusion includes recommendations for potential future developments in the realm of lifelog tasks.
  • Quang-Linh Tran, Binh Nguyen, Gareth J. F. Jones and Cathal Gurrin
    [Pdf] [Table of Content]
    This paper presents the MemoriEase retrieval system that participated in the NTCIR Lifelog-5 Task. We report our method to solve the lifelog retrieval problem and discuss the official results of the MemoriEase at Lifelog-5 task. The MemoriEase system was originally introduced in the Lifelog Search Challenge (LSC) as an interactive lifelog retrieval system and it is modified to an automatic retrieval system to address the NTCIR Lifelog-5 Task. We propose the BLIP-2 model as the core embedding model to retrieve lifelog images from textual queries. The open-sourced Elasticsearch search engine serves as the main engine in the MemoriEase system. Some pre-processing and post-processing techniques are applied to adapt this system to an automatic version and improve the accuracy of retrieval results. Finally, we discuss the result of the system on the task, some limitations of the system, and lessons learned from participating in the Lifelog-5 task for further improvements for the system in the future.
  • Gia Huy Vuong, Van-Son Ho, Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Thang-Long Nguyen-Ho, Minh-Khoi Pham, Tu-Khiem Le, Van-Tu Ninh and Minh-Triet Tran
    [Pdf] [Table of Content]
    The rise of digital storage technology and portable sensors has led to an increase in lifelogging, where individuals digitally record their personal experiences. This has opened up new research opportunities in lifelog data retrieval. However, the real-time and automatic recording of data by sensors presents unique challenges compared to traditional search engines, particularly in data organization and search. The highly personalized nature of the dataset also necessitates the consideration of user interactions and feedback in the search engine. In this paper, we present LifeInsight, a robust lifelog retrieval system designed specifically for the NTCIR17 Lifelog-5 Task. Originally developed for the Lifelog Search Challenge (LSC), the system has been adapted and optimized to address the unique requirements of the Lifelog Semantic Access Task (LSAT). Of the two tasks within NTCIR17 Lifelog-5, our primary focus is on the interactive sub-task, which involves evaluating LifeInsight's performance under different user interaction approaches employed by various users. Therefore, a comprehensive user study was conducted to evaluate the LifeInsight system encompassed both expert and novice users across various settings, including ad-hoc and known-item-search scenarios.
  • Thang-Long Nguyen-Ho, Tien-Thanh Nguyen-Dang, Gia Huy Vuong, Van-Son Ho, Xuan-Dang Thai, Minh-Khoi Pham, Tu-Khiem Le, Van-Tu Ninh and Minh-Triet Tran
    [Pdf] [Table of Content]
    As the demand for personalized data retrieval systems continues to grow, recent research has emphasized the development of lifelog retrieval mechanisms. Many new research and methods have focused on studying the integration of user interactions and feedback into search engines. In this paper, we introduce the automation approach of LifeInsight, a retrieval system designed explicitly for the NTCIR-17 Lifelog-5 Automatic Task, facilitating a seamless search experience and efficient data mining. Our method entails a two-fold process, where we first enrich the metadata from the raw query, followed by the composition of the retrieval method from input entities. Our proposed system not only enhances the search process but also ensures a comprehensive and detailed analysis of lifelog data for diverse applications. By focusing primarily on the automatic sub-task, we demonstrate the efficacy of our LifeInsight retrieval algorithm, showcasing competitive results that rival those of an expert user.
  • Ricardo Ribeiro, Alexandre Gago, Bernardo Kaluza, Josefa Pandeirada and António Neves
    [Pdf] [Table of Content]
    In recent years, the practice of continuously recording and collecting information about several aspects of individuals’ lives has gained increased popularity. This practice, known as lifelogging, serves multiple purposes, including personal health monitoring and enhancement as well as recording day-to-day activities in hopes of preserving some memories. An essential aspect of this practice lies in the gathering and analysis of image data, offering valuable insights into an individual’s lifestyle, dietary patterns, and physical activities. The NTCIR Lifelog Challenge presents a unique opportunity to delve into the latest advancements in lifelogging research, particularly in the field of image retrieval and analysis. Researchers are encouraged to present their methodologies and participate in lifelog retrieval challenges. Consequently, these challenges allow research teams to assess the efficiency and accuracy of their developed systems using a multimodal dataset derived from an active lifelogger’s 18 months of continuous lifelogging data. This paper presents the current version of MEMORIA, a computational tool that provides an intuitive user interface with several options that allow the user to upload images, explore the segmented events, and perform image retrieval, namely images for the NTCIR Lifelog event. This version of MEMORIA incorporates natural language search capabilities for information retrieval, offering options to filter results based on keywords and time periods. The system integrates image analysis algorithms to process visual lifelogs. These algorithms range from pre-processing algorithms to feature extraction methods, to enrich the annotation of the lifelogs. The paper also includes experimental results of the image annotation methods used in MEMORIA, as well as some examples of user interaction.
  • Naushad Alam, Yvette Graham and Cathal Gurrin
    [Pdf] [Table of Content]
    In this work, we present our system DCUMemento as part of our team ’DCU’ to participate in the NTCIR-17 Lifelog-5 task. Our system leverages a suite of CLIP models developed by OpenAI as well as larger models from the OpenCLIP model suite which are trained on ∼5x more data as compared to the OpenAI models. We also discuss the query data structure for the task as well as the models/ensemble approaches used in our system. Finally, we present the results from our submitted runs providing a comparative analysis of the approaches as well as discuss future work in this direction.
  • Return to Top


    [MedNLP-SC]


  • Shoko Wakamiya, Lis Kanashiro Pereira, Lisa Raithel, Hui-Syuan Yeh, Peitao Han, Seiji Shimizu, Tomohiro Nishiyama, Gabriel Herman Bernardim Andrade, Noriki Nishida, Hiroki Teranishi, Narumi Tokunaga, Philippe Thomas, Roland Roller, Pierre Zweigenbaum, Yuji Matsumoto, Akiko Aizawa, Sebastian Möller, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Shuntaro Yada and Eiji Aramaki
    [Pdf] [Table of Content]
    This paper presents the Social Media Adverse Drug Event Detection (SM-ADE) subtask as part of the shared task Medical Natural Language Processing for Social Media and Clinical Texts (MedNLP- SC) at NTCIR-17. The SM-ADE subtask aims to identify a set of symptoms caused by a drug, referred to as adverse drug event (ADE) detection, within social media texts in multiple languages, including Japanese, English, French, and German. The competition attracted 26 teams, of which eight submitted official runs for the SM-ADE subtask. We believe this task will be essential to develop core technologies of practical medical applications in the near future.
  • Yuta Nakamura, Shouhei Hanaoka, Shuntaro Yada, Shoko Wakamiya and Eiji Aramaki
    [Pdf] [Table of Content]
    This paper describes the Radiology Report TNM staging (RR-TNM) subtask as a part of NTCIR-17 Medical Natural Language Processing for Social Media and Clinical Texts (MedNLP-SC) shared task in 2023. This subtask focused on automated lung cancer staging based on radiology reports. We created a dataset of 243 Japanese radiology reports containing no personal health information. A total of three teams with 16 members participated and submitted seven solutions. The best accuracy scores for the T, N, and M categories reached 67%, 80%, and 93%, respectively. Through the RR- TNM subtask, we have provided a valuable open Japanese clinical corpus and useful insights to apply natural language processing for secondary usage of staging information.
  • Anubhav Gupta and Frédéric Rayar
    [Pdf] [Table of Content]
    The FRAG team participated in the Social Media (SM) subtask of the NTCIR-17 MedNLP-SC Task. Our approach involved fine-tuning a multilingual transformer-based model on the train set. The team ranked 3rd for English, German and Japanese based on Exact accuracy and Binary scores.
  • Mizuho Nishio, Hidetoshi Matsuo, Takaaki Matsunaga, Koji Fujimoto, Morteza Rohanian, Farhad Nooralahzadeh, Fabio Rinaldi and Michael Krauthammer
    [Pdf] [Table of Content]
    We describe our submission to the RR-TNM subtask of the NTCIR-17 MedNLP-SC shared task. In the RR-TNM subtask, we developed our system for automatic extraction and classification of the TNM staging from Japanese radiology reports of lung cancers. In our system, zero-shot classification and prompt engineering were performed using ChatGPT and LangChain, respectively. According to the accuracies calculated by the organizers of the RR-TNM subtask, the accuracies of N and M factors in the TNM staging were higher in our submission than in the other submissions. These results indicate that our system with ChatGPT and LangChain may be promising.
  • Takuya Fukushima, Yuka Otsuki, Shuntaro Yada, Shoko Wakamiya and Eiji Aramaki
    [Pdf] [Table of Content]
    This paper describes how we tackled the Medical Natural Language Processing for Radiology Report TNM staging (RR-TNM) Subtask as participants of NTCIR17. The RR-TNM Subtask is a MedNLP-SC original task to classify radiology reports under multiple criteria. We introduced three different methods based on pre-trained language models (PLMs), including a medical-specific model. Notably, our combination approach, utilizing JMedRoBERTa (manbyo-wordpiece) for label T, Tohoku-BERT-v3 for label N, and UTH-BERT for label M, achieved an accuracy of 0.3704 on the test data. This performance was the highest among all participants, emphasizing the effectiveness of our strategy.
  • Yong-Zhen Huang, Eugene Sy, Yi-Xuan Lin, Yu-Lun Hsieh and Yung-Chun Chang
    [Pdf] [Table of Content]
    The TMUNLP team participated in the adverse drug event (ADE) detection subtask, focusing on social media texts in English for NTCIR-17's MedNLP-SC. This paper outlines our approach to addressing the challenge. Within the ADE subtask, we primarily implemented two methods to tackle the long tail distribution issue: distribution balanced loss and data augmentation. Finally, we employed ensemble learning to enhance the performance of our model.
  • Hongyu Li, Yongwei Zhang, Yuming Zhang, Shanshan Jiang and Bin Dong
    [Pdf] [Table of Content]
    Our team SRCB participated in the Social Media Adverse Drug Event Detection(SM-ADE) subtask of NTCIR-17 Medical Natural Language Processing for Social media and Clinical texts (MedNLP-SC). The task focuses on solving the problem of Adverse Drug Event (ADE) detection for social media texts in Japanese, English, German, and French, which is a multi-labeling problem aimed at expressing the positive or negative status as an ADE for 22 symptom labels respectively. In this paper, we report our approaches which can be mainly categorized into 3 types according to which task we cast the original task to, including multi-label classification, binary classification and joint entity and relation extraction. Besides, we also conduct optimizations on the approaches that rely on pre-trained transformer language models, with the support of various techniques such as continual pretraining, gradient boosting methods, and transfer learning.
  • The-Quyen Ngo, Duy-Dao Do and Phuong Le-Hong
    [Pdf] [Table of Content]
    The VLP team participated in the MedNLP-SC subtask of the NTCIR-17. This paper reports our approach to solving the problem and discusses our experimental and official results. We present approaches which combine the training datasets using different methods, either vertically or horizontally across the languages. We use different text representation methods, either using a continuous embedding vector generated by a large pretrained language model or a discrete count vector generated by a simple bag-of-word method. Our proposed approaches achieve good performance - our system is ranked in the top two or three of the best performing systems for the task.
  • Smilla Fox, Martin Preiß, Florian Borchert, Aadil Rasheed and Matthieu-P. Schapranow
    [Pdf] [Table of Content]
    The Social Media Adverse Drug Event Detection (SM-ADE) track of the NTCIR-17 MedNLP-SC shared task aims to identify adverse drug events (ADE) in Japanese, English, French, and German social media texts. In this paper, we describe selected details of our contribution addressing the shared task. As a base model, we fine-tune RoBERTa models for the different language subtasks. In addition, we apply ensemble learning and data augmentation techniques. By leveraging data augmentation, we successfully elevate the resulting micro-averaged F1 scores on the German dataset by 5pp compared to the baseline. The application of ensemble learning yields a remarkable improvement of 7pp. Through combining RoBERTa with these methods, we achieve promising results in the challenge. Our best runs accomplish exact accuracy scores between 0.84 and 0.87 and per-class F1 scores between 0.77 and 0.82, consistently achieving the second-best results across all languages.
  • Hsiao-Chuan Liu, Vidhya Nataraj, Chia-Tung Tsai, Wen-Hsuan Liao, Tzu-Yu Liu, Mike Tian-Jian Jiang and Min-Yuh Day
    [Pdf] [Table of Content]
    The IMNTPU team engaged in the NTCIR-17 RealMedNLP task, specifically focusing on Subtask1: Adverse Drug Event detection (ADE) and the challenge of identifying related radiology reports. This task is centered on harnessing methodologies that offer significant aid in real-world medical services, especially when training resources are limited. In our approach, we harnessed the power of pre-trained language models (PLMs), particularly leveraging models like the BERT transformer, to understand both sentence and document structures. Our experimentation with diverse network designs based on PLMs paved the way for an enlightening comparative analysis. Notably, BioBERT-Base emerged as a superior contender, showcasing commendable accuracy relative to its peers. Furthermore, our investigation made strides in the realm of one-shot learning for multiclass labeling, specifically with the GPT framework. The insights gathered emphasized the necessity for more specialized strategies, suggesting avenues for future research in multiclass labeling tasks.
  • Koji Fujimoto, Mizuho Nishio, Chikako Tanaka, Morteza Rohanian, Farhad Nooralahzadeh, Michael Krauthammer and Fabio Rinaldi
    [Pdf] [Table of Content]
    In this manuscript, we describe our submission to the RR-TNM subtask of NTCIR-17 MedNLP-SC shared task. We took an approach to create extensive question-and-answer (Q&A) pairs related to TNM classification as a method of domain-specific augmentation. Compared to the result without data augmentation, improvement in the accuracy especially for the M stage was observed.
  • Lya Hulliyyatus Suadaa, Eko Putra Wahyuddin and Farid Ridho
    [Pdf] [Table of Content]
    This paper presents the system and results of the STIS team for the Social Media (English) subtasks of the NTCIR-17 MedNLP-SC Task. We proposed incorporating the sentiment of social media texts into a pre-trained Transformer model in detecting adverse drug events on social media. A lexicon-based and rule-based sentiment analysis VADER model was used to predict each tweet sentiment. Based on the experimental results of the ADE vs. non-ADE binary classification task, our proposed fine-tuned model outperformed the baseline by a slight difference. Specifically, our model achieves a better F1 score for 9 of 22 symptoms in the symptom detection task.
  • Beatrice Portelli, Alessandro Tremamunno, Simone Scaboro, Emmanuele Chersoni and Giuseppe Serra
    [Pdf] [Table of Content]
    The AILAB team participated in the Social Media subtask of the NTCIR-17 MedNLP-SC Task. This paper reports our approach to solving the problem and discusses the official results. The presented model performs binary classification of the tweets and, given an UMLS term, determines whether it is present as an ADE in the tweet. Due to this design, it does not need an intermediate ADE extraction step, and it can be extended to new UMLS terms currently not present in the text. The base model used in the experiments is multilingual SapBERT, which was fine-tuned in a monolingual and multilingual setting. The best results were achieved by training the model on multilingual data.
  • Return to Top


    [QA Lab-PoliInfo-4]


  • Yasuhiro Ogawa, Yasutomo Kimura, Hideyuki Shibuki, Hokuto Ototake, Yuzu Uchida, Keiichi Takamaru, Kazuma Kadowaki, Tomoyoshi Akiba, Minoru Sasaki, Akio Kobayashi, Masaharu Yoshioka, Tatsunori Mori, Kenji Araki and Teruko Mitamura
    [Pdf] [Table of Content]
    The goal of the NTCIR-17 QA Lab-PoliInfo-4 task is to develop real-world complex question answering (QA) techniques using Japanese political information such as local assembly minutes and newsletters. QA Lab-PoliInfo-4 consists of four subtasks: Question Answering-2, Answer Verification, Stance Classification-2, and Minutes-to-Budget Linking. In this paper, we present the data used and the results of the formal run.
  • Akira Nakada and Yoshinobu Kano
    [Pdf] [Table of Content]
    We participated in the Stance Classification 2 (SC2) subtask of NTCIR- 17 QA Lab-PoliInfo-4 as Team KIS. In this paper, we describe our stance (agreement or disagreement) classification model for utter- ances of Japanese politicians with domain-adaptive training in the political domain. We additionally trained the Japanese pretrained LUKE model with a Masked Language Model (MLM) on the Diet minutes dataset. We also preprocessed the model using the head- tail method to truncate utterances longer than the maximum input length. We found that these methods were effective, achieved the highest score of 97.41% in accuracy in the formal run of the sub- task.
  • Daigo Nishihara, Hokuto Ototake and Kenji Yoshimura
    [Pdf] [Table of Content]
    This paper reports on the fuys team's NTCIR-17 QA Lab-PoliInfo-4 Minutes-to-Budget Linking (MBLink) results. We thought that related tables could be found by focusing on the cells of the table. Learning inferences were made by combining the text of <p> tag with an ID and the text of table cell. The two were encoded and combined to perform a binary classification. We considered a table relevant if there was at least one related word in the table's cells. We also tried this when the text of a table cell was joined column by column and combined with the text of a <p> tag with an ID. The best accuracy was obtained when the text in table cells was joined column by column.
  • Yuuki Tachioka
    [Pdf] [Table of Content]
    The ditlab team participated in the Question Answering 2 subtask of the QA Lab-Poliinfo-4. First, we modified a QA Alignment system that has been developed for PoliInfo-3 QA Alignment subtask in order to make paragraphs composed of the related answer sentences. BM25 vectors were constructed for each paragraph of all answers and the target answers were selected by the question summaries and subtopics based on the cosine similarity. Second, a Text-to-Text Transfer Transformer (T5) was used to summarize the associated answer. For making fine-tuning data of T5, all data were used and the data selection based on the ROUGE scores was used.
  • Daiki Iwayama, Hideyuki Shibuki and Tatsunori Mori
    [Pdf] [Table of Content]
    In this paper, we describe our work on Answer Verification. We submitted one result to Answer Verification. The method used for the submitted data is to input the "AnswerSummary," "AnswerOriginal," and "QuestionSummary" items together and have ChatGPT classify them. As a result, an Accuracy of 0.5800 for the Answer Verification was obtained.
  • Guan-Yu Chen, Yu-Cheng Liu, Tao-Hsing Chang and Fu-Yuan Hsu
    [Pdf] [Table of Content]
    This paper aims to design a model that can determine whether the politician's stance is approved or disapproved the bill based on the politician's utterance on a specific bill in the parliament. This study proposed two frameworks for determining the stance in utterances. The first framework involves concatenating BERT model with Bi-LSTM model to form a comprehensive decision-making model while the second framework is concatenating Curie model with ChatGPT model. This paper used the dataset provided by Stance Classification 2 task in NTCIR-17 for model training and testing, and GPT-based model this paper proposed achieved an accuracy of 0.932.
  • Hidenori Yamato, Takaaki Fukunaga, Makoto Okada and Naoki Mori
    [Pdf] [Table of Content]
    The omuokdlb team participated in two subtasks in NTCIR 17 QA Lab-Poliinfo-4: Question Answering-2 and Answer Verification. In Question Answering-2, we use Bidirectional Encoder Representations from Transformers (BERT) to match the question summary and the answer utterances. Then, we generated a summary of the answer to the question by using Text-to-text Transfer Transformer (T5). In Answer Verification, we created binary classifiers using BERT to determine whether or not answers, and we confirmed the effectiveness of the combination of the training data.
  • Tasuku Shin, Haruki Ishikawa, Yuki Gato, Eiji Kuramoto and Tomoyoshi Akiba
    [Pdf] [Table of Content]
    AKBL team participated in the Question Answering-2, Answer Ver- ification, Stance Classification-2, and Minutes-to-Budget Linking subtasks. For the Question Answering-2 subtask, Our system ex- tracts relevant transcripts from question metadata and summarizes them using a T5 model pre-trained in Japanese. For the Answer Verification subtask, our method first generates the pseudo-fake data automatically by round-trip translation, and then fine-tunes the pre-trained BERT with the training data and pseudo-fake data. For the Stance Classification-2 subtask, our best system is a binary classifier using RoBERTa. For the Minutes-to-Budget Linking sub- task, it was realized using a ranking method based on Okapi BM25.
  • Koki Horikawa and Masaharu Yoshioka
    [Pdf] [Table of Content]
    The HUKB team participated in the Question Answering-2 subtask in the NTCIR-17 QA Lab-PoliInfo-4 task. Our proposed method is divided into three steps. First, we found the sentence on the beginning of the same topic as the input question from the respondent’s utterances and extracted the candidate sentences. Next, we found the sentences where the respondent seemed to answer the input question directly, using BERT. Finally, we entered the selected sentences with the input question into the T5 based summarizer, and generated the answer summary. We evaluated the whole method and each process with the dataset distributed by Task Organizers.
  • Return to Top


    [SS-2]


  • Haitao Li, Jia Chen, Jiannan Wang, Weihang Su, Qingyao Ai, Xinyan Han, Beining Wang and Yiqun Liu
    [Pdf] [Table of Content]
    This is an overview of the NTCIR-17 Session Search (SS-2) task. The task features the Fully Observed Session Search subtask (FOSS), the Partially Observed Session Search subtask (POSS) and the Session-level Search Effectiveness Estimation subtask(SSEE). This year, we received 16 runs from 2 teams in total. This paper will describe the task background, data, subtasks, evaluation measures, and the evaluation results, respectively.
  • Dongshuo Liu, YiDong Liang and Zhijing Wu
    [Pdf] [Table of Content]
    The BITIR team participated in the IR subtask of the NTCIR-17 Session Search(SS) Task. This paper reports our approach to solving the problem and discusses the official results. More specifically, for FOSS and POSS tasks, we submit two times by using the classical retrieval model BM25 and graph-based context-aware document ranking model HEXA. Results show that our runs perform well on the test dataset with relevance label, but poorly on the official test dataset provided This may be due to the problem of noise and a small candidate set. For SSEE task, We use two traditional metrics: sDCG and sRBP. The result indicates that sRBP has a higher consistency with golden user satisfaction based on our settings.
  • Xinyan Han, Yiteng Tu, Haitao Li, Qingyao Ai and Yiqun Liu
    [Pdf] [Table of Content]
    Session Search holds significant importance in the field of information retrieval and user experience. In this paper, we detail the approach of the THUIR_SS team in the NTCIR17 Session Search (SS-2) task. Specifically, we submit five runs for FOSS and POSS tasks respectively. We try different approaches to feature fusion, including learning to rank and linearly combination. The final report of the SS-2 Task demonstrate the effectiveness of our method, significantly outperforming other competitors.
  • Return to Top



    Pilot Tasks


    [FairWeb-1]


  • Sijie Tao, Nuo Chen, Tetsuya Sakai, Zhumin Chu, Hiromi Arai, Ian Soboroff, Nicola Ferro and Maria Maistro
    [Pdf] [Table of Content]
    This paper provides an overview of the NTCIR-17 FairWeb-1 Task. FairWeb-1 is an English web search task which seeks more than an ad-hoc web search task. Our task considers not only document relevance but also group fairness.We designed three types of search topics for this task: researchers (R), movies (M), and Youtube contents (Y). For each topic type, attribute sets are defined for considering group fairness. We utilise a deduped version of the Chuweb21 corpus as the target corpus. We received 28 runs from six teams, including six runs from the organisers team. In this paper, we describe the task, the test collection construction and the official evalution results of the submitted runs.
  • Fumian Chen and Hui Fang
    [Pdf] [Table of Content]
    Providing relevant, diverse, and fair results is crucial for information retrieval systems. It has attracted more and more attention because of issues caused by traditional relevance-centric retrieval systems. These issues include the problem of echo chambers and the increasingly polarized online communities. Therefore, we participated in the NTCIR-17 FairWeb-1 Task to provide group fairness to researchers, movies, and YouTube content and submitted five runs. The runs are based on a recently proposed fair ranking framework, DLF. The experimental results demonstrate that, in many cases, DLF can improve fairness while maintaining relevance but still needs more exploration for ordinal fairness groups and documents with longer text. This paper reports how the runs were constructed and discusses their performance and future work.
  • Fan Li, Kaize Shi, Kenta Inaba, Sijie Tao, Nuo Chen and Tetsuya Sakai
    [Pdf] [Table of Content]
    The RSLFW team participated in the NTCIR-17 FairWeb 1 Task. This paper reports our approach to solving the problem and dis- cusses the official results. We applied several different methods to generate 5 runs, including PM-1, PM-2 and DetGreedy algorithm, all of which are post-processing approaches. We also utilized COIL (Contextualize Inverted List) as the RSLFW baseline. By combining official baseline and COIL baseline with different fairness-related algorithm, we analyzed the results of those methods. Our reranked run outperforms the baseline, resulting in an improved GFR score.
  • Sachin Pathiyan Cherumanal, Kaixin Ji, Danula Hettiachchi, Johanne R. Trippas, Falk Scholer and Damiano Spina
    [Pdf] [Table of Content]
    This report describes the participation of the RMIT IR group at the NTCIR-17 FairWeb-1 task. We submitted five runs with the aim of exploring the role of explicit search result diversification (SRD) and ranking fusion to generate fair rankings considering multiple fairness attributes. We also explored the use of a linear combination-based technique (LC) to take into consideration the relevance while re-ranking. In this report, we compared results from all our submitted runs against each other and the retrieval baselines along each topic type separately (i.e., Researcher, Movie, YouTube). Overall, our results show that neither the SRD-based runs nor the linear combination-based runs show any statistically significant improvement over the retrieval baselines. The source code of the framework for generating group memberships is made available at https://github.com/rmit-ir/fairweb-1.
  • Yiteng Tu, Haitao Li, Zhumin Chu, Qingyao Ai and Yiqun Liu
    [Pdf] [Table of Content]
    The fairness of search systems has become an important research topic to the IR community. This paper presents and discusses the efforts of the THUIR team in developing effective and fair retrieval models and ranking algorithms in the NTCIR-17 FairWeb-1 Task. Specifically, we utilize several different methods in all 5 submitted runs including reranking, learning-to-rank, and search result diversification algorithms to deal with the group fairness problem in web search. The final report of the FairWeb-1 Task indicates that our methods have outperformed other competitors on both result relevance and fairness. In terms of the GFR (Group Fairness Relevance) metric, our methods respectively outperform the second-ranked team by 9.74%, 17.8%, and 19.8% on three topics of queries.
  • Return to Top


    [Transfer]


  • Hideo Joho, Atsushi Keyaki and Yuki Oba
    [Pdf] [Table of Content]
    This paper provides an overview of the NTCIR-17 Transfer task, a pilot task that aims to bring together researchers from Information Retrieval, Machine Learning, and Natural Language Processing to develop a suite of technology for transferring resources generated for one purpose to another in the context of dense retrieval on Japanese texts. Two subtasks were proposed for this round: the Dense First Stage Retrieval subtask and the Dense Reranking subtask. We received 29 runs for the First Stage Retrieval and 25 runs for the Reranking subtask from three research groups. The evaluation results of these runs are presented and discussed in this paper.
  • Yuuki Tachioka
    [Pdf] [Table of Content]
    The ditlab team participated in the Transfer task composed of dense retrieval and dense reranking subtasks. We trained sentence-BERT by using a Japanese version of mMARCO dataset and commonly used for both subtasks. We compared three types of models that were trained according to three types of losses: softmax, triplet, multiple negatives ranking losses. The results show that the multiple negatives ranking loss was the best for both subtasks. In addition, system fusions significantly improved the performance especially for the retrieval task.
  • Tomoya Hashiguchi, Ryota Mibayashi, Huu-Long Pham, Wakana Kuwata, Yuka Kawada, Yuya Tsuda, Takehiro Yamamoto and Hiroaki Ohshima
    [Pdf] [Table of Content]
    The KANDUH team participated in the Transfer subtasks 1 and 2 of NTCIR-17. In this paper, we report on our approach to solving the problem and the results. Subtasks 1 and 2 address the dense vector search task, respectively. In both subtasks 1 and 2, we used BM25 to filter documents, followed by dense vector retrieval. The method with the highest nDCG@20 was 0.4339 the one that first finetuned DeBERTa-v2 with MSMARCO and then additionally finetuned with NTCIR-1 data. On the other hand, the method with the lowest nDCG@20 was 0.0751 the one that fine-tuned only MSMARCO data.
  • Kenya Abe, Kota Usuha and Makoto P. Kato
    [Pdf] [Table of Content]
    This paper describes the KASYS team's participation in the NTCIR-17 Transfer Task. To generate our runs, we used neural IR models such as Contriever, ColBERT, and SPLADE with different fine-tuning strategies.
  • Return to Top


    [UFO]


  • Yasutomo Kimura, Hokuto Ototake, Kazuma Kadowaki, Takahito Kondo and Makoto P. Kato
    [Pdf] [Table of Content]
    The goal of the NTCIR-17 UFO task is to develop techniques for extracting structured information from tabular data and documents, focusing on annual securities reports. The Non-Financial Objects in Financial Reports (UFO) task consists of two subtasks: table data extraction (TDE) and text-to-table relationship extraction (TTRE). The TDE subtask, for understanding the structure of tables in annual securities reports, classifies each cell into one of four classes. The TTRE subtask is for linking the values of the tables with a relevant sentence in the text. We present the data used for and the results of the formal run for these subtasks.
  • Yuki Okumura and Masato Fujitake
    [Pdf] [Table of Content]
    The FA team participated in the Table Data Extraction (TDE) and Text-to-Table Relationship Extraction (TTRE) task of the NTCIR-17 Understanding of Non-Financial Objects in Financial Reports (UFO). This paper reports our approach to solving the problem and discusses the official results. We successfully utilized various enhancement techniques based on the ELECTRA language model to extract valuable data from tables. Our efforts resulted in an impressive TDE accuracy rate of 93.43\%, positioning us in second place on the Leaderboard rankings. This outstanding achievement is a testament to our proposed approach's effectiveness. In the TTRE task, we proposed the rule-based method to extract meaningful relationships between the text and tables task and confirmed the performance.
  • Eisaku Sato, Keiyu Nagafuchi, Yuma Kasahara, Kazuma Kadowaki and Yasutomo Kimura
    [Pdf] [Table of Content]
    The OUC team participated in the Table Data Extraction (TDE) subtask and the Text-to-Table Relationship Extraction (TTRE) of NTCIR-17 Understanding of Non-Financial Objects in Financial Reports (UFO). In this paper, we report our methodology in this task and discuss the official results.
  • Daigo Nishihara, Hokuto Ototake and Kenji Yoshimura
    [Pdf] [Table of Content]
    This paper reports the results of the fuys team's NTCIR-17 UFO Text-to-Table Relationship Extraction (TTRE). Since we thought that Value cells depend on Name cells, we came up with a method that uses the result of extracting Name cells to connect them together. The text of a HTML <mark> tag and texts of cells were used to find Name. These two were encoded and combined to perform a binary classification. We tried several combinations of mark tag text and cell text. The best results were obtained using mark tags and tables in the same section of the same company. We tried two different rules for binding Value cells. The rule of finding a cell by the row and column combination of the cell that became the Name yielded good results.
  • Tomokazu Hayashi and Hisashi Miyamori
    [Pdf] [Table of Content]
    This paper describes the methods and results of Team KSU for the UFO task at NTCIR-17. In the TDE subtask, we designed methods for cell type classification using exhaustive tree structures based on the spanning sizes of the merged cells in the table. In the TTRE subtask, we designed methods for cell retrieval based on the cell class. Scores on the F-measure in the formal run were 95.37% for ID81 in TDE and 9.18%, 4.08%, and 6.63% for ID99 on the Name, Value, and Total, respectively, in TTRE. Scores on the F-measure including the formal run and late submission run were 95.37% for ID81 in TDE and 32.21%, 27.19%, and 29.70% for ID127 on the Name, Value, and Total, respectively, in TTRE.
  • Nobushige Doi and Mayuri Tanaka
    [Pdf] [Table of Content]
    The JPXIteam participated in the table data extraction subtask of the NTCIR-17 UFO Task. This study outlines our methodology to address this challenge and analyzes the official results. Our approach to solving this subtask involved few-shot text classification using ChatGPT. This paper discusses the implications of these results, highlighting the contributions of this study in advancing table structure recognition.
  • Hiroyuki Higa, Yuuki Maeyama, Keisuke Nanjo and Kazuhiro Takeuchi
    [Pdf] [Table of Content]
    Text-to-Table Relationship Extraction (TTRE)[3] has emerged as a significant research topic. Although tables enable humans to com- prehend complex data structures quickly, machines often struggle with such interpretations. The primary challenge of this paper lies in understanding the myriad intentions behind the table’s creation and the possible ambiguity when viewed without context. We pr pose an approach to address these issues by embedding a table in a textual context. Specifically, we convert tables contained in HTML- formatted documents to the Markdown format and create training data that combine the tables with information about the associated question text and elements. Then, we use the training data to train a QLoRA model based on llama2-13b-chat-hf. This approach promotes holistic interpretation of tables and their associated texts within a single vector space.
  • Return to Top


    [ULTRE-2]


  • Zechun Niu, Jiaxin Mao, Qingyao Ai, Lixin Zou, Shuaiqiang Wang and Dawei Yin
    [Pdf] [Table of Content]
    In this paper, we present an overview of the Unbiased Learning to Rank Evaluation 2 (ULTRE-2) task, a pilot task at the NTCIR-17. The ULTRE-2 task aims to evaluate the effectiveness of unbiased learning to rank (ULTR) models with a large-scale user behavior log collected from Baidu.com, a commercial Web search engine. In this paper, we describe the task specification, dataset construction, implemented baselines, and official evaluation results of the submitted runs.
  • Lulu Yu, Keping Bi, Jiafeng Guo and Xueqi Cheng
    [Pdf] [Table of Content]
    The Chinese academy of sciences Information Retrieval team (CIR) has participated in the NTCIR-17 ULTRE-2 task. This paper describes our approaches and reports our results on the ULTRE-2 task. We recognize the issue of false negatives in the Baidu search data in this competition is very severe, much more severe than position bias. Hence, we adopt the Dual Learning Algorithm (DLA) to address the position bias and use it as an auxiliary model to study how to alleviate the false negative issue. We approach the problem from two perspectives: 1) correcting the labels for non-clicked items by a relevance judgment model trained from DLA, and learn a new ranker that is initialized from DLA; 2) including random documents as true negatives and documents that have partial matching as hard negatives. Both methods can enhance the model performance and our best method has achieved nDCG@10 of 0.5355, which is 2.66% better than the best score from the organizer.
  • Return to Top