Registration Deadline:
until a week before each Formal Run
[detailed schedule]
2015-08-25 The following subsections have been updated:
4.1 Phase-1,
5.1.3 Evaluation Result Format for National Center Test,
5.1.4 Question XML Format for Second-stage Examination,
5.1.5 Gold Standard XML Format for Second-stage Examination,
5.2.3 Answer Sheet XML Format for Second-stage Examination,
5.2.4 Question Analysis Format,
5.2.5 Query Format,
5.2.6 IR Format,
5.2.7 System Description,
5.3.2 Format Checker for Second-stage Examination
2015-08-05 Registration deadline was extended.
2015-08-04 The number of questions in Table 2 was updated.
2015-07-21 Phase-2 schedule was udpated.
2015-06-30 Registration deadline was extended, and the Phase-1 schedule was revised.
2015-05-27 Phase-2's Evaluation Results Publication date (Nov. 14, 2015) was updated.
2015-05-12 We've set up two official Twitter accounts: @NTCIRQALab (English) and @NTCIRQALabJA (Japanese).
2015-05-11 New Website Release.
2015-08-25 The following subsections have been updated:
4.1 Phase-1,
5.1.3 Evaluation Result Format for National Center Test,
5.1.4 Question XML Format for Second-stage Examination,
5.1.5 Gold Standard XML Format for Second-stage Examination,
5.2.3 Answer Sheet XML Format for Second-stage Examination,
5.2.4 Question Analysis Format,
5.2.5 Query Format,
5.2.6 IR Format,
5.2.7 System Description,
5.3.2 Format Checker for Second-stage Examination
2015-08-05 Registration deadline was extended.
2015-08-04 The number of questions in Table 2 was updated.
2015-07-21 Phase-2 schedule was udpated.
2015-06-30 Registration deadline was extended, and the Phase-1 schedule was revised.
2015-06-16 Japanese translations for 4. Submission and 5. Format, were added.
2015-06-01 Japanese translations for 3. Collection and Tools, were added.
2015-05-27 Phase-2's Evaluation Results Publication date (Nov. 14, 2015) was updated.
2015-05-26 2.1.3 Phase-2 was added.
2015-05-26 The number of High School Textbooks, Named Entity annotated was revised.
2015-05-13 2.2.1 Training Set was revised. A description about Yoyogi seminar was removed.
2015-05-13 Table 2: Test Collection was revised. Yosemi was moved from Training to Test (Phase-1 and -3).
2015-05-12 A "Twitter Hashtag Button" and 2 "Twitter Follow Button"s were added.
2015-05-12 A "What's New" and an "Update History" were added.
2015-05-11 New Website Release.
English version | Japanese version | Hashtag |
---|---|---|
Follow @NTCIRQALab | Follow @NTCIRQALabJA | Tweet #NTCIRQALab |
1. Overview
The goal is to investigate the real-world complex Question Answering (QA) technologies using Japanese university entrance exams and their English translation on the subject of "World History". The questions were selected from two different stages - The National Center Test for University Admissions (multiple choice-type questions) and from secondary exams at multiple universities (complex questions including essays). All the questions are provided in an XML format.
Some of the highlights are:
- Solving real-world problems.
- Many questions require an understanding of the surrounding context.
- Some questions require inference.
- Encourage the investigation on each question types, including complex essay, simple essay, factoid, slot-filling, true-false, etc.
- Good venue to investigate specific answer types (e.g. person-politician, person-religious), advanced entity-focused passage retrieval, enhance knowledge resources, semantic representation and sophisticated learning.
As knowledge resources, 4 sets of high school textbook and wikipedia will be provided. Participants can use any other resources (need to report). Two open-source baseline QA systems and one passage retrieval systems are also provided. Tests in English Subtask will be done in two phases (Phase-1 and -3). Tests in Japanese Subtask will be done in three phases (Phase-1, -2 and -3). In the first phase, the question types are explicitly provided and the participants allow to work on specific question type(s) only. The evaluation results are analyzed according to the types.
- Open Advancement: We encourage each participant to work with own purpose(s) on end-to-end system, on particular question types and/or component(s) either of the QA platform provided or own system, or to build any resources/tools usable to improve QA systems for entrance exams.
- Evaluating continuous progress and Enhance the knowledge resources: The organizers run all the components contributed from participants periodically to see the progress.
- Forum: We place emphasis on building a community by bridging different communities.
NTCIR-12のQA Labタスクでは,現実世界における質問応答への第一歩として,大学入試問題を解くことを目的としています.科目は世界史 に限定し,センター試験及び複数の大学の二次試験を対象にしています.試験問題は日本語ですが,全ての問題を英訳し,日本語と英語の どちらでも参加できる環境を整えています.試験問題は,XML形式で提供します.
試験問題の特徴として,以下のものがあります.
- 実世界の問題である
- 周囲の文脈の理解を必要とする問題が多い
- 推論を必要とする問題がある
- 多数の質問形式がある(事実型,穴埋め,正誤判断,論点が複数ある長い論述文,短い論述文など)
知識源として,二社(東京書籍,山川出版社)の世界史の教科書およびWikipediaデータを提供しています. 参加者は,それ以外のリソースも自由に使用することができます(報告の必要あり).さらに,自然言語文の真偽判断をサポートするイベントオントロジーや,オープンソースのベースラインQAシステム(日本語版,英語版),パッセージ検索システムを提供しています.
Japanese Subtaskのrunは3回(Phase-1からPhase-3)行う予定です.English Subtaskのrunは2回(Phase-1とPhase-3)行う予定です.最初のrunでは,問題形式を明示的に提供します.参加者は特定の問題形式に対してのみ解答を提出しても構いません.評価結果は問題形式ごとに行います.また,第2回のrunでは協賛関係にある東ロボプロジェクト(http://21robot.org/)と合同で大学受験模試に参加する予定です.どのrunに参加するかは参加者の自由です.
2. Task Description
2.1 English Subtask
The Questions are;
- 1) National Center Tests: English Translation of the Japan’s National Center Test for University Admissions (Multiple-Choice type questions), and
- 2) Second-stage Examinations: English Translation of the Second-stage Examination of some of the Japanese Universities (Various types of Questions including; factoid, slot-filling, true-or-false, essays, others).
The subject is “World History”
All the questions are provided in XML format
English subtask has two formal runs – Phase-1 and -3. Phase-2 will be done in Japanese Subtask only and participate the mock examinations. Participants can choose which phase(s) they will participate – Phase-1 only, Phase-3 only or Both Phase-1 and -3.
2.1.1 Training Set
The training set will be delivered on July 1st for the participants who submitted the signed user agreement forms. It consists of the training and test data sets used in NTCIR-11 QA-Lab task and contains – i) three sets of National Center Tests, ii) two sets of Second-stage Examinations, iii) Knowledge Sources (a snapshot of Wikipedia subset related to world history), and vi) Right Answers. Please understand that the Right answers and nuggets for Essays are provided in Japanese only.
2.1.2 Phase-1
Targets are both National Center Tests and Second-stage Examinations. To support the deeper analysis and investigation about each type of questions, the task organizers had defined the following set of the Questions Types and provide the Question Type Table which describing the question type of the each question in the test set. Each participant can choose whether using this corresponding table or not. Each participant can decide to run on every question types or on particular question type(s) only. The evaluation results will be provided by question types.
The task organizers defined the six question types as following. Please consult the example of each question type here.
- A1: Complex Essay (CE)
- A2: Simple Essay (SE)
- B1: Factoid (F)
- B2: Slot-Filing (SF)
- C: True-or-False (TF)
- D: Unique (U)
2.1.3 Phase-2
Phase-2 will be done in Japanese Subtask only.
2.1.4 Phase-3
Targets are both National Center Tests and Second-stage Examinations. The Question Type Table does not provided in this phase. Each participant can decide to run on every question types or on particular question type(s) only. The overall evaluation will be provided.
2.1.5 Evaluation
For “Factoid”, “Slot Filling”, “True-or-False”, and “Unique”, the evaluation will be done using the scores provided by National Center for University Admissions and each university, and the accuracy.
For “Complex Essay” and “Simple Essay”, the evaluation will be done using various versions of ROUGE and pyramid method using nuggets in Japanese Subtask. In Japanese Subtask, three Reference essays for each of the Complex Essay questions and one reference essay for each of the Simple Essay questions and nuggets which were constructed by the reference essay writers and voted by three assessors with the weight (0-3) are used for pyramid method, but we are sorry but these reference essays and nuggets are available in Japanese only. It is because of the limitation of the resources. Any suggestions and proposals for evaluation of English essays are welcome!
The evaluation methodologies for Essays are still under discussion and consideration. Please join the discussion about the evaluation methodologies.
2.2 Japanese Subtask
The Questions are;
- 1) National Center Tests: English Translation of the Japan’s National Center Test for University Admissions (Multiple-Choice type questions), and
- 2) Second-stage Examinations: English Translation of the Second-stage Examination of some of the Japanese Universities (Various types of Questions including; factoid, slot-filling, true-or-false, essays, others).
The subject is “World History”
All the questions are provided in XML format
Japanese subtask has two formal runs – Phase-1, -2 and -3. Phase-2 will be done in Japanese Subtask only and participate the mock examination organized by the NII Todai Robot Project. Participants can choose which phase(s) they will participate.
対象とする試験問題は,センター試験(多岐選択式問題)及び二次試験(多岐選択式問題及び記述式問題)です. 教科は世界史です.試験問題はXML形式で提供します.
Japanese SubtaskではrunをPhase-1, Phase-2, Phase-3の3回行う予定です. Phase-2はJapanese Subtaskのみで教育機関の大学受験模試に参加します. 参加者はどのrunに参加するかを自由に選ぶことができます.
2.2.1 Training Set
- a) the training and test data sets used in NTCIR-11 QA-Lab task and contains – i) three sets of National Center Test questions, ii) two sets of Second-stage Examinations questions, iii) Knowledge Sources (a snapshot of Wikipedia, 4 sets of high school textbooks on world history), and vi) Right Answers. Right answers for the essays are the reference essays and weighted nuggets voted by three assessors with scores 0-3.
- b) New Knowledge source – 3 sets of high school textbooks annotated by named entities
- a) NTCIR-11 QA-Labで使用されたトレーニングセットとテストセット,および – i) センター試験問題3セット, ii) 二次試験問題2セット, iii) 知識源 (Wikipediaのスナップショット,高校の世界史教科書4セット),vi) 正解データ.論述問題の正解データは,解答データと重み付きナゲット(3人の評価者が0から3で投票).
- b) 新しい知識源 – 固有表現に注釈付けられた高校教科書3セット
2.2.2 Phase-1
Targets are both National Center Tests and Second-stage Examinations. To support the deeper analysis and investigation about each type of questions, the task organizers had defined the following set of the Questions Types and provide the Question Type Table which describing the question type of the each question in the test set. Each participant can choose whether using this corresponding table or not. Each participant can decide to run on every question types or on particular question type(s) only. The evaluation results will be provided by question types.
The task organizers defined the six question types as following. Please consult the example of each question type here.
- A1: Complex Essay (CE)
- A2: Simple Essay (SE)
- B1: Factoid (F)
- B2: Slot-Filing (SF)
- C: True-or-False (TF)
- D: Unique (U)
センター試験及び二次試験が対象です.オーガナイザ側が定義するQuestion Typeと各問題の対応表も配ります.対応表を使うか使わないかは参加者の自由です. また,どの問題に解答するかも参加者の自由です.特定のQuestion Typeの問題にのみ解答しても構いません. Phase1では,Question Type毎の評価を中心とします.
オーガナイザ側が定義するQuestion Typeは以下の6つです.各Question Typeの問題の例は,こちらをご覧ください.
- A1:大論述(Complex Essay, CE)
- A2:小論述(Simple Essay, SE)
- B1:事実型(Factoid, F)
- B2:穴埋め(Slot-Filing, SF)
- C:真偽判定型(True-or-False, TF)
- D:特殊(Unique, U)
2.2.3 Phase-2
Participate in the mock exam which organized by NII Todai Robot Project, which are designed to preliminary trail for the National Center Tests and the Second-stage Examination of Tokyo University.
東ロボプロジェクトが行っている大学受験模試に参加します.対象はセンター模試と東大模試です. 東ロボプロジェクトについては詳しくはこちらをご覧ください.
Phase2のみ評価結果の外部への公開が早まります.参加の際にはご注意ください.
2.2.4 Phase-3
Targets are both National Center Tests and Second-stage Examinations. The Question Type Table does not provided in this phase. Each participant can decide to run on every question types or on particular question type(s) only. The overall evaluation will be provided.
センター試験及び二次試験が対象です.Question Typeの対応表は配りません.特定のQuestion Typeの問題のみの解答は自由です.ただし,Phase3では総合的な評価を中心とします.
2.2.5 Evaluation
For “Factoid”, “Slot Filling”, “True-or-False”, and “Unique”, the evaluation will be done using the scores provided by National Center for University Admissions and each university, and the accuracy.
For “Complex Essay” and “Simple Essay”, the evaluation will be done using various versions of ROUGE and pyramid method using nuggets in Japanese Subtask. In Japanese Subtask, three Reference essays for each of the Complex Essay questions and one reference essay for each of the Simple Essay questions and nuggets which were constructed by the reference essay writers and voted by three assessors with the weight (0-3) are used for pyramid method.
The evaluation methodology for Essays are still under discussion and consideration. Please join the discussion about the evaluation methodologies.
事実型,穴埋め,真偽判定型,特殊の4つのQuestion Typeについては,センター試験や各大学が設定した得点や,正解率で評価します.
大論述,小論述についてはROUGEやpyramid nugget votingを用いた評価を予定しています. Japanese Subtaskでは,大論述問題1つにつき3つの解答データ,小論述問題1つにつき1つの解答データがあります. そして,解答データの作成者によって作成されたナゲットがあります. ナゲットは,3人の評価者がそれぞれ0から3で投票し重み付けられ,ピラミッド方式に使用されます.
論述問題の評価方法についてはより相応しい手法を検討中です. 参加者からの評価方法の提案などもお待ちしています.
2.3 Sample Questions
3. Collection and Tools
3.1 Collection
Participants are free to use any resources available with the exception of the answer sets (readily available online in Japanese). In addition, the following resources are provided, but are not required to be used. i) three sets of National Center Tests, ii) two sets of Second-stage Examinations, iii) Knowledge Sources (a snapshot of Wikipedia subset related to world history), and vi) Right Answers
参加者はどのようなリソースを利用しても構いません(日本語でオンラインで入手できる解答集合は除く). また,次のリソースを提供します.使用するかどうかは参加者の自由です. i) センター試験問題3セット, ii) 二次試験問題2セット, iii) 知識源 (Wikipediaのスナップショット,高校の世界史教科書4セット,固有表現に注釈付けられた高校教科書3セット),vi) 正解データ.論述問題の正解データは,解答データと重み付きナゲット(3人の評価者が0から3で投票).
Type (Language) | Training | Test | |||
---|---|---|---|---|---|
Phase-1 | Phase-2 | Phase-3 | |||
Questions | National Center Test (E, J) | NTCIR-11 Set * [225 Qs] Q-Types known |
NTCIR-12 (1) * [41 Qs] Q-Types known |
NTCIR-12 (2) * [36 Qs] Q-Types unknown |
|
Second-stage Examinations (J) | NTCIR-11 Set * Essay [74 Qs] Non-Essay [608 Qs] Q-Types known |
NTCIR-12 (1) * Essay [25 Qs] Non-Essay [218 Qs] Q-Types known |
NTCIR-12 (2) * Essay [31 Qs] Non-Essay [195 Qs] Q-Types unknown |
||
Second-stage Examinations (E) | NTCIR-11 Set * Essay [73 Qs] Non-Essay [608 Qs] Q-Types known |
NTCIR-12 (1) * Essay [18 Qs] Non-Essay [218 Qs] Q-Types known |
NTCIR-12 (2) * Essay [24 Qs] Non-Essay [195 Qs] Q-Types unknown |
||
Yoyogi Seminar Trial Exams (J) Multiple Choice |
Yosemi * [72 Qs] |
Yosemi * [72 Qs] |
Yosemi * [72 Qs] |
||
Mock Exam by Todai Robot Project (J) | Torobo Trial * TBA |
||||
Right Answers | National Center Test (E, J) | NTCIR-11 Set * | NTCIR-12 (1) * | NTCIR-12 (2) * | |
Second-stage Examinations | |||||
Reference Essays and Nuggets (J) | NTCIR-11 Set * | NTCIR-12 (1) * | NTCIR-12 (2) * | ||
Yoyogi Seminar Trial Exams (J) Multiple Choice |
Yosemi * | Yosemi * | |||
Mock Exam by Todai Robot Project (J) | Torobo Trial * | ||||
Knowledge Source | Textbooks (J) | 4 High School Textbooks, xml tagged * | |||
3 High School Textbooks, Named Entity annotated * | |||||
Wikipedia (J, E) | Wikipedia snapshot (J), http://warehouse.ntcir.nii.ac.jp/openaccess/qalab/11QALab-ja-wikipediadata.html | ||||
Wikipedia snapshot (E), https://github.com/oaqa/ntcir-qalab-cmu-baseline/wiki/Solr-Instance-with-Indexed-Wikipedia-Subset | |||||
Event Ontology (J) | World History Ontology (J), http://researchmap.jp/zoeai/event-ontology-EVT/ | ||||
Data Release Date | questions | Jul. 1, 2015 at 11:59 p.m. SST |
Aug. 25, 2015 at 11:59 p.m. SST |
Oct, 2015 | Dec. 1, 2015 at 11:59 p.m. SST |
right answers | Jul. 1, 2015 at 11:59 p.m. SST |
Sep. 21, 2015 at 11:59 p.m. SST |
Nov. 14, 2015 | Dec. 28, 2015 at 11:59 p.m. SST |
|
knowledge source | Jul. 1, 2015 at 11:59 p.m. SST | ||||
Evaluation Results Publication | In QA Lab task | Sep. 21, 2015 at 11:59 p.m. SST |
Nov. 14, 2015 | Dec. 28, 2015 at 11:59 p.m. SST |
|
For public | Jun. 7, 2016 | Nov. 14, 2015 | Jun. 7, 2016 | ||
Note: "*": Submission of a pair of signed user agreement forms is needed before the data release.
*印: データ配布の前に、記名押印した参加者用覚書(2部)の提出が必要です
|
3.1.1 Sets of National Center Tests
- Sets of National Center Tests, available in Japanese and English.
- センター試験問題セット:日本語と英語で利用可能
3.1.2 Sets of Second-stage Examinations
- Sets of Second-stage Examinations, available in Japanese and English.
- 二次試験問題セット:日本語と英語で利用可能
3.1.3 Knowledge Sources
- Japanese high school textbooks on world history, available in Japanese.
- A snapshot of Wikipedia, available in Japanese and in English. (Participants can also use the current up-to-date version).
- Solr Instance with Indexed Wikipedia Subset (available in English)
https://github.com/oaqa/ntcir-qalab-cmu-baseline/wiki/Solr-Instance-with-Indexed-Wikipedia-Subset - NTCIR-11 QA Lab Japanese subtask: Wikipedia Data Set
http://warehouse.ntcir.nii.ac.jp/openaccess/qalab/11QALab-ja-wikipediadata.html
- Solr Instance with Indexed Wikipedia Subset (available in English)
- World history ontology, available in Japanese.
イベントオントロジーEVT
- 高校の世界史教科書:日本語で利用可能
- Wikipediaのスナップショット:日本語と英語で利用可能 (参加者は現在の最新版を利用しても良い)
- Solr Instance with Indexed Wikipedia Subset (英語で利用可能)
https://github.com/oaqa/ntcir-qalab-cmu-baseline/wiki/Solr-Instance-with-Indexed-Wikipedia-Subset - NTCIR-11 QA Lab Japanese subtask: Wikipedia Data Set (日本語で利用可能)
http://warehouse.ntcir.nii.ac.jp/openaccess/qalab/11QALab-ja-wikipediadata.html
- Solr Instance with Indexed Wikipedia Subset (英語で利用可能)
- 自然言語文の真偽判断をサポートするイベントオントロジー:日本語で利用可能
イベントオントロジーEVT
3.1.4 Right Answers
- Right answers for National Center Tests, available in Japanese and English.
- Right answers for Second-stage Examinations, available in Japanese.
- Reference essays and nuggets for Essays, available in Japanese.
- センター試験問題セットの正解データ:日本語と英語で利用可能
- 二次試験問題セットの正解データ:日本語で利用可能
- 論述問題の解答正解データ及び重み付きナゲットデータ:日本語で利用可能
3.2 Tools
- 1 baseline QA system for English, based on UIMA (CMU)
https://github.com/oaqa/ntcir-qalab-cmu-baseline - 1 baseline QA system for Japanese, based on YNU's MinerVA, CMU's Javelin and a question analysis module by Madoka Ishioroshi, re-constructed and implemented as UIMA components by Yoshinobu Kano
https://bitbucket.org/ntcirqalab/factoidqa-centerexam/ - Scorer and Format Checker for National Center Test
https://bitbucket.org/ntcirqalab/qalabsimplescorer - Passage Retrieval Engine passache
https://code.google.com/p/passache/
- ベースラインQAシステム(英語)
https://github.com/oaqa/ntcir-qalab-cmu-baseline
CMUが開発したUIMAベースのシステム - ベースラインQAシステム(日本語)
https://bitbucket.org/ntcirqalab/factoidqa-centerexam/
横浜国大のMinerVAとCMUのJavelinの二つのQAシステムが基になっており,石下円香がセンター試験問題解析モジュールを作成し,狩野芳伸がUIMAコンポーネントとして再構成したもの - センター試験問題用スコアラー&フォーマットチェッカー
https://bitbucket.org/ntcirqalab/qalabsimplescorer - パッセージ検索エンジン passache
https://code.google.com/p/passache/
4. Submission
4.1 Phase-1
QALab-2 participants will submit the followings for Phase-1:
QALab-2 Phase-1 参加者には以下のものを提出して頂きます.
File | Who should submit | Submission restriction | Format | Pharse-1 Due (11:59 p.m. SST) |
---|---|---|---|---|
Question Analysis results (QA) | QA participants | up to 3 runs will be evaluated | XML | Aug. 31, 2015 |
Queries for IR | QA participants | up to 3 runs will be evaluated | TBA | Aug. 31, 2015 |
End-to-End QA run results (FA) | QA participants | up to 3 runs will be evaluated | XML | Sep. 7, 2015 |
IR run results (RS) | IR participants | up to 3 rns will be evaluated | XML | Sep. 7, 2015 |
Combination run results | QA participants | up to 3 runs will be evaluated | XML | Sep. 14, 2015 |
System Description (SD) | ALL participants (required) | TXT | Sep. 14, 2015 |
Evaluate the equal number of runs from each team will be evaluated. The runs to be evaluated will be selected in the descending order of the priority. The number of the runs evaluated will be decided according to the total number of the submitted runs and the resources.
評価では,各チーム毎に同じ数のランを評価します.評価対象のランは優先順が高い順から選ばれます.評価するラン数は提出されたランの合計数とリソースから決定します.
4.2 Phase-2
TBA
4.3 Phase-3
TBA
5. Format
5.1 Distribution Format Specification
5.1.1 Question XML Format for National Center Test
Overview
The question will be distributed in the following format. The formal run questions will be available at 0:00 (JST) of the first day of the Formal Runs at Phase-1 and Phase-3 at the Download Web site for each participating team. This is the same URL to download the Training Questions and the Document collections.
試験問題は後に示すようなxml形式で配布されます.フォーマルランの問題は,各ランの最初の日の0:00(JST)に利用可能になります.ダウンロードするためのURLは,トレーニングデータや文書データのものと同じです.
DTD
"torobo.dtd" is available at the Download Website for each participating team.
センター試験問題に対応した「torobo.dtd」は参加チーム用のダウンロードサイトから入手できます.
二次試験のdtdは準備中です.
Sample XML Format
A full sample XML file is included in the "sample_questions_for_center_test.zip" file, which is available at the Download Website for EACH participating team. The below is an abridged sample. The tag structure is the same in English questions.
完全なサンプルXMLファイルは「sample_questions_for_center_test.zip」に含まれています.参加チーム用のダウンロードサイトから入手できます.以下はサンプルの一部です.タグ構造は英訳した試験問題でも共通です.
5.1.2 Gold Standard XML Format for National Center Test
Gold Standard for training data is available from the Download website, whose URL was delivered to each participating team after submitting the signed user agreement forms. Gold Standard of testing (Formal run) data will be delivered after each phase of formal runs.
トレーニングデータの正解データは,参加チーム用のダウンロードサイトから入手できます.URLは参加者用覚書を提出後に各参加チームに配信されます.フォーマルランのデータの正解データは,各ランの後に配信されます.
DTD
"answerTable.dtd" is available from the Download website for each participating team although it is refered its URL in the XML file.
「answerTable.dtd」は(XMLファイル内でURLが参照されていますが)参加チーム用のダウンロードサイトから入手できます.
Sample XML Format
A full XML files are available in the training data, "answer-0625training-center_test(1997,2001,2005,2009).zip" file, which is available at the Download webliste for each participating team. The below is an abridged sample for the GoldStandard (right answer). The submission format for the final results is the same.
トレーニングデータ用の完全なXMLファイル「answer-0625training-center_test(1997,2001,2005,2009).zip」は参加チーム用のダウンロードサイトから入手できます.以下は正解データのサンプルの一部です.センター試験問題への最終的な出力結果の提出形式はこれと同じ形式です.
5.1.3 Evaluation Result Format for National Center Test
Sample Format
5.1.4 Question XML Format for Second-stage Examination
Sample XML Format
5.1.5 Gold Standard XML Format for Second-stage Examination
Sample XML Format
5.1.6 Evaluation Result Format for Second-stage Examination
TBA
5.2 Submission Format Specification
5.2.1 Run ID Format
Each run has to be associated with a RunID which is an identity for each run and use the RunID as the filename of the run results. Based on NTCIR CLIR and ACLIA format as a base, the RunID is defined as follows;
各ランはRunIDに関連付けられている必要があります.RunIDは各ランのIDであり,ランの結果のファイル名に使用します.RunIDは以下のように定義します.(NTCIR CLIR及びACLIAのフォーマットをベースにしています.)
[Topics' (Questions') File Name without the Extension (.xml)]_[Team ID]_[Language]_[RunType]_[Priority].[FileType]
Two character language codes are as follows;
Languageコードには以下の二種類があります.
- EN (English)
- JA (Japanese)
Two character RunType codes are as follows;
RunTypeコードには以下のものがあります.
- QA (Question analysis module output) (XMI and/or XML)
- RS (Information Retrieval result) (XMI and/or XML)
- FA (Final Answer of the End-to-End QA and Combination Run output) (XML only)
- SD (System Description Form) (Text file)
Priority Parameter:
The "Priority" is two digits used to represent the priority of the run, taking 01 as the highest. It will be used as a parameter for pooling and priority to be evaluated and analysed the results. The number of the runs included in the evaluation may vary according to the total number of submissions.
Priorityはランの優先度を決める二桁の数字列で,最高順位は「01」です.結果の評価や分析の優先順位を表す値として使用します. 提出されたランの総数に応じて評価されるランの数が変わる可能性があります.
File Type:
Please indicate the file type using "xmi" or "xml". For System Description Form, please use "txt"
ファイルタイプには「xmi」か「xml」を使用してください.System Description Formには「txt」を使用してください.
For Combination runs, please list all the Team ID's used in the run.
[Topics' (Questions') File Name without the Extension (.xml)]_[IR Team ID]_[Question Answering Team ID]_[Language]_[RunType]_[Priority].[FileType]
For example, suppose TEAM1 submitted an Information Retrieval result with a RunID:
Center-2009--Main-WorldHistoryB_TEAM1_EN_RS_01.xml
If Question Answering participant TEAM2 uses this result as an input to the system, and TEAM2 submits the combination run with a following RunID;
Center-2009--Main-WorldHistoryB_TEAM1_TEAM2_FA_01.xml
コンビネーションランではランに使用したチームのID全てを並べてください.
[Topics' (Questions') File Name without the Extension (.xml)]_[IR Team ID]_[Question Answering Team ID]_[Language]_[RunType]_[Priority].[FileType]
例えば,TEAM1がIRの結果を以下のようなRunIDで提出したとします.
Center-2009--Main-WorldHistoryB_TEAM1_EN_RS_01.xml
QA参加者のTEAM2がこの結果をシステムの入力として使用したとすると,TEAM2の提出するRunIDは以下のようになります.
Center-2009--Main-WorldHistoryB_TEAM1_TEAM2_FA_01.xml
5.2.2 Answer Sheet XML Format for National Center Test
End-to-End QA participants will submit their output (answers) in this format. Also use this format for the combination run. Basically the same as the GoldStandard Format.
End-to-End QA参加者は,このフォーマットで解答を提出して頂きます. コンビネーションランでも同様です. 基本的に正解データのフォーマットと同じです.
DTD
"answerTable.dtd" is available from the Download website for each participating Team although it is refered its URL in the XML file.
「answerTable.dtd」は(XMLファイル内でURLが参照されていますが)参加チーム用のダウンロードサイトから入手できます.
Sample XML Format
5.2.3 Answer Sheet XML Format for Second-stage Examination
End-to-End QA participants will submit their output (answers) in this format. Also use this format for the combination run.
End-to-End QA参加者は,このフォーマットで解答を提出して頂きます. コンビネーションランでも同様です.
XML Schema
"second_stage_exam.xsd" is available from the Download website for each participating Team although it is refered its URL in the XML file.
「second_stage_exam.xsd」は(XMLファイル内でURLが参照されていますが)参加チーム用のダウンロードサイトから入手できます.
Sample XML Format
5.2.4 Question Analysis Format
Please see here.
5.2.5 Query Format
Please see here.
5.2.6 IR Format
Please see here.
5.2.7 System Description
System Description is available at the Download Website for each participating team.
5.3 Scorer and Format Checker
5.3.1 Scorer and Format Checker for National Center Test
National Center Test Scorer is available. Sorry, currently the README is available in Japanese only. The English version will be provided later.
センター試験スコアラは参加者用ダウンロードページから入手できます.(現状,READMEは日本語のみです.)
https://bitbucket.org/ntcirqalab/qalabsimplescorer
5.3.2 Format Checker for Second-stage Examination
Second-stage Examination Format Checker is available. Sorry, currently the README is available in Japanese only. The English version will be provided later.
https://bitbucket.org/ntcirqalab/qalabsimplescorer
5.4 Misc
5.4.1 Question Analysis Restrictions
You can submit up to 99 runs, but we may evaluate the first three runs only. We will select the runs to evaluate and analyze according to the "priority" specified in the Run ID.
99個までのランが提出できますが,評価するのは最初の3つのランのみ(PRIORITY = 01,02,03, ... )の予定です.評価及び分析に用いるランはRunID内の「priority」に従って,優先度の高い順から選びます.
5.4.2 IR4QA Restrictions
We will accept up to 1,000 documents IDs, although we would evaluate the IR-effectiveness in terms of the effectiveness of the final QA results and we do not assess the relevance of the IR results themselves. IR results relevance may be able to assess later.
文書ID1,000個まで受け付けます.我々は最終的なQAへの結果の有用性の観点からのみIRの結果の評価を行います.IRの結果自体の評価は行いません.IRのみの結果の評価は後に行うかもしれません.
5.4.3 End-to-End and Combination QA Restrictions
You can submit up to 99 runs, but the first three run (PRIORITY = 01,02,03, ... ) should be evaluated.You can submit up to top 30 answers for each topic. However, due to resource constraints, we may not be able to evaluate all the answers in each topic.
See RunIDFormat for more details.
99個までのランが提出できますが,評価するのは最初の3つのランのみ(PRIORITY = 01,02,03, ... )の予定です. また,選択式以外の問題の場合,一問につき30個までの解答が提出できますが,リソースの制限により,各問題のすべての解答を評価できない可能性があります.
詳しくはRunIDFormatをご覧ください.
5.4.4 Encoding
Distribution files are encoded in UTF-8. Please encode submission files in UTF-8 as well.
配布するファイルの文字コードはUTF-8です.提出ファイルも同様にUTF-8としてください.
6. Results
The evaluation results and right answers for the phase1 and phase3 will be delivered one month after each phase to all the registered participants of the QA Lab task, and publish in the task overview paper at the NTCIR-12 Conference, which will be held in June 2016
The evaluation results of the Phase-2 (Japanese subtask only), will be published in November 14, 2015 at the Symposium of the Tobai Robot Project.
Phase-1, Phase-3の評価結果は各Phaseのひと月後に参加者内で公開する予定です.外部へは2016年6月のNTCIR-12のカンファレンスでのタスクoverviewで公開されます.
Phase-2の結果については,東ロボプロジェクトの成果報告会に合わせて11月頃に外部を含めて公開する予定です.
7. Important Dates
Registration Deadline
until a week before each Formal Run
Register here.
Phase-1 (English and Japanese)
Question Analysis Module: | Tuesday, August 25, 2015 at 11:59 p.m. SST | - | Monday, August 31, 2015 at 11:59 p.m. SST |
Formal Run: | Tuesday, September 1, 2015 at 11:59 p.m. SST | - | Monday, September 7, 2015 at 11:59 p.m. SST |
Combination Run: | Tuesday, September 8, 2015 at 11:59 p.m. SST | - | Monday, September 14, 2015 at 11:59 p.m. SST |
Phase-2 (Japanese Subtask only)
Question Analysis Module: | We do not do this. Question Analysis Module will be done in Phase-1 and -3 only. | ||
Formal Run for Mock Examination of Second-stage Examination: | Thursday, October 1, 2015 at 11:59 p.m. JST | - | Thursday, October 8, 2015 at 11:59 p.m. JST |
Formal Run for Mock Examination of National Center Test: | Tuesday, October 13, 2015 at 11:59 p.m. JST | - | Tuesday, October 20, 2015 at 11:59 p.m. JST |
Combination Run: | We do not do this. Combination Run will be done in Phase-1 and -3 only. |
Phase-3 (English and Japanese)
Question Analysis Module: | Tuesday, December 1, 2015 at 11:59 p.m. SST | - | Monday, December 7, 2015 at 11:59 p.m. SST |
Formal Run: | Tuesday, December 8, 2015 at 11:59 p.m. SST | - | Monday, December 14, 2015 at 11:59 p.m. SST |
Combination Run: | Tuesday, December 15, 2015 at 11:59 p.m. SST | - | Monday, December 21, 2015 at 11:59 p.m. SST |
Time Zone:
SST (Samoa Standard Time, UTC-11)
JST (Japan Standard Time, UTC+9)
8. Organizers
-
Noriko KandoNational Institute of Informatics, Japan
-
Madoka IshioroshiNational Institute of Informatics, Japan
-
Teruko MitamuraTeruko Mitamura
Carnegie Mellon University, USA -
Yoshinobu Kano [website]
Shizuoka University, Japan -
Hideyuki ShibukiYokohama National University, Japan
-
Kotaro SakamotoYokohama National University, Japan
-
Akira Fujita[website]
National Institute of Informatics, Japan
9. Publications
NTCIR 12 Kick-Off Event QALab-2 Slide
http://research.nii.ac.jp/ntcir/ntcir-12/pdf/NTCIR-12-Kickoff-QALab.pdf
NTCIR 12 Kick-Off Event QALab-2 Video Recording
NTCIR 11 QALab-1 Papers
http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings11/NTCIR/toc_ntcir.html#QALab
NTCIR 11 QALab-1 Evaluation Results
http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings11/NTCIR/Evaluations/QALab/ntc11-QALAB-eval.htm
NTCIR 11 QALab-1
http://ntcir.nii.ac.jp/QA-Lab/1--Overview/