NTCIR-12 テストコレクション: NTCIR-12 タスク参加者用データセット一覧
タスク | サブタスク | データ | ||||||||||
データ種類 | ジャンル/タスク | 言語 | ファイル名 | 文書数/ トピック数(サイズ) |
配布予定日 | 作成年度 | ||||||
core | IMine | 文書データ | Web | Cs | SogouT **a | ca.130M pages (ca. 5TB) |
ready to use | crawled and released on Nov 2008 | ||||
SogouQ **a | ca. 4GB Chinese query logs | collected in 2008/2011 | ||||||||||
E | ClueWeb12-B13 **b | 52M English Web pages | ready to use | crawled during 2012 | ||||||||
タスクデータ | Query Understanding | CsEJ | NTCIR-12 IMine-2 Task Data | 100-150queries | June 2015 | - | ||||||
Vertical Incorporating | CsE | |||||||||||
タスクデータ(システム訓練用データ) | Query Understanding | CsEJ | NTCIR-9 INTENT Task Data, NTCIR-10 INTENT-2 Task Data, NTCIR-11 IMine Task Data | 100 Queries (INTENT, INTENT-2) and 50 Queries (IMine-1) for each language | ready to use | - | ||||||
Vertical Incorporating | ||||||||||||
MedNLPDoc | 文書データ | Health Record | J | Training data: mednlpdoc-train.xml | 200 documents | Jan 15, 2016 | 2015 | |||||
Test data for task 1: mednlpdoc-test.xml | 100 documents | 2016 | ||||||||||
MobileClick | 文書**c・クエリー | Information Retrieval | E | i: NTCIR-12 MobileClick document sets (English)**c | 100 queries | Aug, 2015 | 2015 | |||||
ii: NTCIR-12 MobileClick query sets (English) | ||||||||||||
J | iii: NTCIR-12 MobileClick document sets (Japanese)**c | |||||||||||
iv: NTCIR-12 MobileClick query sets (Japanese) | ||||||||||||
文書**c・クエリー・iUnits | Summarization | E | i, ii and NTCIR-12 MobileClick iUnit sets (English)**c | |||||||||
J | iii, iv and NTCIR-12 MobileClick iUnit sets (Japanese)**c | |||||||||||
SpokenQuery &Doc |
文書データ | Spokenquery&SpokenDocument retrieval documents | Spokenquery&SpokenDocument retrieval documents (SDPWS data set) | |||||||||
タスクデータ | SQ-SCR/ Document retrieval | J | NTCIR-12 音声クエリ&音声ドキュメント検索 データセット | |||||||||
SQ-STC/ Term retrieval | ||||||||||||
タスクデータ(システム訓練用データ) | J | NTCIR-11 音声クエリ&音声ドキュメント検索 データセット/ NTCIR-10 音声ドキュメント検索 データセット/ NTCIR-9 音声ドキュメント検索 データセット |
||||||||||
Temporalia | 文書データ | Web(News) | C | SogouCA **a | ready to use | 2012 | ||||||
SogouT **a | ||||||||||||
E | LivingKnowledge news and blogs annotated subcollection**d | ca. 3.8M docs (ca. 20GB) | ready to use | 2011-2013 | ||||||||
タスクデータ(Formal Run) | Temporal Intent Disambiguation | NTCIR-12 Temporal Information Access TID データセット | ||||||||||
Temporally Diversified Retrieval | NTCIR-12 Temporal Information Access TDR データセット | |||||||||||
タスクデータ(システム訓練用) | NTCIR-11 Temporal Information Access TQIC データセット/ NTCIR-11 Temporal Information Access TIR データセット |
|||||||||||
pilot | Lifelog | 文書データ | Images, Visual Concepts, Semantic Content | TBA | ||||||||
タスクデータ | Lifelog data | NTCIR-12 Lifelog データセット (Dry Run) | ready to use | |||||||||
NTCIR-12 Lifelog データセット (Formal Run) | ||||||||||||
QA Lab | 文書データ | English Subtask | E | Wikipedia Corpus: | Solr Instance with Indexed Wikipedia Subset | ready to use (open access |
||||||
Japanese Subtask | J | Wikipedia Corpus: | NTCIR-11 QA Lab for Entrance Exam Japanese Wikipedia Data Set | |||||||||
Textbook Data: | Japanese Textbook Corpus1 -World History Subset (Tokyo Shoseki Text Data/ Tokyo Shoseki Annotation Data/ Tokyo Shoseki Index Data) |
txt/xml/ index data |
572KB(txt data) | ready to use | 2007,2008 | |||||||
Textbook Data: | Japanese Textbook Corpus2 - World History Subset (Yamakawa Shuppansha Text Data/Yamakawa Shuppansha Annotation Data/Yamakawa Shuppansha Index Data) |
txt/xml/ index data |
252KB(Text Data) | ready to use | 2010 | |||||||
システム訓練用タスクデータ | English/Japanese Subtask |
E/J | Sample Questions | National Center Test Sample Questions | HTML,xml | 28KB(html) | ready to use | |||||
Second-stage Examination Sample Questions | 78KB(html) | |||||||||||
Training data | National Center Test Training Data (Question) |
xml | 230 Topics | ready to use | 1997,2001,2003, 2005,2007,2009 |
|||||||
National Center Test Training Data (Question Format) |
TBA | |||||||||||
National Center Test Training Data (Right Answer) |
ready to use | |||||||||||
Second-stage Examination Training Data (Question and Answer Sheet) |
661 Topics | TBA | 2005,2007,2009 | |||||||||
Second-stage Examination Training Data (Question Format) |
TBA | |||||||||||
J | Second-stage Examination Training Data (Right Answer and Answer Nugget) |
TBA | ||||||||||
タスクデータ | English Subtask | E | Phase1 | Phase1 Test Data | Jul.14,2015 | |||||||
Phase1 Answer Data | Aug.13,2015 | |||||||||||
Phase3 | Phase3 Test Data | Dec,2015 | ||||||||||
Phase3 Answer Data | TBA | |||||||||||
Japanese Subtask | J | Phase1 | Phase1 Test Data | Jul.14,2015 | ||||||||
Phase1 Answer Data | Aug.13,2015 | |||||||||||
Phase2 | Phase2 Test Data | Oct,2015 | ||||||||||
Phase2 Answer Data | Nov.14,2015 | |||||||||||
Phase3 | Phase3 Test Data | Dec,2015 | ||||||||||
Phase3 Answer Data | TBA | |||||||||||
Tools | English/Japanese Subtask |
E/J | Scorer | Scorer and Format Checker | 64.3MB | ready to use | ||||||
English Subtask | E | Baseline System | NTCIR QALab CMU Baseline | 14.7MB | ||||||||
Japanese Subtask | J | Baseline System | Kachako factoidQA-センター試験解答器 | 179MB | ||||||||
Ontology | イベントオントロジー | xml | 16.2MB | |||||||||
STC | 文書データ | Web | C/J | |||||||||
タスクデータ | Chinise Subtask/Japanese Subtask | C/J | NTCIR-12 Short Text Conversation Task Data |
Last Modified: 2015-06-25