NTCIR-11 Test Collections: data sets for NTCIR-11 Workshop Participants
task | subtask | data | ||||||||||
data type | genre/task | language | file name | distribution date | number of documents/ topics (size) |
year | ||||||
core | IMine | Document Data | Web | Cs | SogouT | ready to use **a |
ca.130M pages (ca. 5TB) |
crawled and released on Nov 2008 | ||||
SogouQ | ready to use **a |
About 4GB | collected in 2008/2011 |
|||||||||
E | ClueWeb12-B3 | ready to use **b |
crawled during 2012 | |||||||||
Task Data | Subtopic Mining | CsEJ | NTCIR-11 IMine Task Data | Topics and non-diversified baseline DR runs released: Jan, 2014 | 50 Queries for each language | - | ||||||
Document Ranking | CsE | |||||||||||
Search Task Mining | J | NTCIR-11 IMine TaskMine Task Data | Topics released: Mar, 2014 | 50 Queries for each language | - | |||||||
Task Data for system training purposes | Subtopic Mining | CsEJ | NTCIR-9/10 INTENT Task DataNTCIR-9/10 INTENT Task Data | Jan 31, 2014 | 100 Queries for each language | - | ||||||
Document Ranking | ||||||||||||
MATH | Document Data | Scientific Articles | E | NTCIR-11 Math Retrieval Document Data (Full dataset) | Apr 15, 2014 | 100,000 docs | 2013 | |||||
Task Data | Math Retrieval | E | NTCIR-11 Math Task Data (Topic) | Jun 2, 2014 | 50 Topics | - | ||||||
Task Data for system training purposes | Math Retrieval | E | NTCIR-11 Math Task Data (Initial dataset) | Mar 10, 2014 | Several Topics | |||||||
MedNLP | Document Data | Health Record | J | Training data: mednlp-2-train.txt | Mar 10, 2014 | 100 documents | 2013 | |||||
Test data for task 1 (NER): test.xml | July 11, 2014 | 49 documents | 2014 | |||||||||
Test data for task2 (Normalization/Coding) | July 25, 2014 | |||||||||||
MobileClick | Documents and Queries | iUnit Retrieval Subtask | Information Retrieval | E | i: NTCIR-11 MobileClick document sets (English) | Mar, 2014 | 60 queries | 2014 | ||||
ii: NTCIR-11 MobileClick query sets (English) | ||||||||||||
J | iii: NTCIR-11 MobileClick document sets (Japanese) | |||||||||||
iv: NTCIR-11 MobileClick query sets (Japanese) | ||||||||||||
Documents, Queries, and iUnits | iUnit Summarization Subtask | Summarization | E | i, ii and NTCIR-11 MobileClick iUnit sets (English) | ||||||||
J | iii, iv and NTCIR-11 MobileClick iUnit sets (Japanese) | |||||||||||
RITE-VAL | Document Data | Fact Validation | JA | J | Wikipedia | Apr 30, 2014 | 1.4GB | 2011-2012 | ||||
Textbooks | Apr 30, 2014 (Textbooks1) July 17, 2014 (Textbooks2) |
1MB | 2011-2012 | |||||||||
Task Data for System Training Purposes | NTCIR-10 RITE2 ExamSearch Task Data JA | Apr 30, 2014 | 1,000 sentences | 2011-2012 | ||||||||
Task Data | NTCIR-11 RITE-VAL Fact Validation Task Data JA | Aug 4 |
1,000 sentences | 2013-2014 | ||||||||
Document Data | EN | E | Wikipedia | Apr 30, 2014 | 18GB | 2013-2014 | ||||||
Task Data for System Training Purposes | NTCIR-10 RITE2 ExamSearch Task Data EN | Apr 30, 2014 | 600 sentences | 2011-2012 | ||||||||
Task Data | NTCIR-11 RITE-VAL Fact Validation Task Data EN | Aug 4 |
600 sentences | 2013-2014 | ||||||||
Document Data | CS | Cs | Wikipedia | Apr 30, 2014 | 1GB | 2014 | ||||||
Task Data for System Training Purposes | NTCIR-11 RITE-VAL Fact Validation CS Training data | Apr 30, 2014 | 45KB | 2014 | ||||||||
Task Data | NTCIR-11 RITE-VAL Fact Validation CS Test data | July 25, 2014 | 2014 | |||||||||
Document Data | CT | Ct | Wikipedia | Apr 30, 2014 | 1GB | 2014 | ||||||
Task Data for System Training Purposes | NTCIR-11 RITE-VAL Fact Validation CT Training data | Apr 30, 2014 | 50KB | 2014 | ||||||||
Task Data | NTCIR-11 RITE-VAL Fact Validation CT Test data | July 25, 2014 | 2014 | |||||||||
Task Data for System Training Purposes | System Validation | JA | J | NTCIR-10 RITE2 BC, MC, ExamBC, UnitTest Task Data JA | Apr 30, 2014 | 3,788 sentence pairs |
2011-2012 | |||||
Task Data | NTCIR-11 RITE-VAL System Validation Task Data JA | Aug 4 |
100,000 sentence pairs | 2013-2014 | ||||||||
Task Data for System Training Purposes | CS | Cs | NTCIR-10 RITE2 BC, MC Task Data CS | Apr 30, 2014 | 2011-2012 | |||||||
Task Data | NTCIR-11 RITE-VAL Fact Validation, CS Test Data | July 25, 2014 | 2014 | |||||||||
Task Data for System Training Purposes | CT | Ct | NTCIR-10 RITE2 BC, MC Task Data CT | Apr 30, 2014 | 2011-2012 | |||||||
Task Data | NTCIR-11 RITE-VAL Fact Validation, CT Test Data | July 25, 2014 | 2014 | |||||||||
SpokenQuery&Doc | Document Data | Spokenquery&SpokenDocument retrieval documents | J | NTICR-10 SpokenDoc documents | ready to use |
114 lectures; total 32 hours (2280 slides) | From 2007 To 2013 | |||||
NTCIR-11 Spokenquery&SpokenDocument retrieval documents | Dec, 2013 | 114 lectures; total 32 hours (2280 slides) | ||||||||||
Document Data for system trainin purpose | Spokenquery&SpokenDocument retrieval documents | NTICR-10 SpokenDoc documents | ready to use | 114 lectures; total 32 hours (2280 slides) | From 2007 To 2013 | |||||||
NTCIR-11 Spokenquery&SpokenDocument retrieval documents | Dec, 2013 | 114 lectures; total 32 hours (2280 slides) | ||||||||||
Task Data | SQ-SCR task | Document retrieval | J | After Mar.2014 (during formal -run) |
less than 120 topicss | - | ||||||
SQ-STD subtask | Term retrieval | |||||||||||
STD-SCR subtask | Document retrieval | |||||||||||
Task Data for system training purposes | SQ-SCR task | Document retrieval | J | After Jan.2014 (at dry-run) |
- | - | ||||||
SQ-STD subtask | Term retrieval | |||||||||||
STD-SCR subtask | Document retrieval | |||||||||||
Pilot | QA Lab | Task Data | English Subtask | E | Center Shiken Exam Data (world_history_B)*: a. questions: Center Shiken Exam Data Set 1 b. answers: Center Shiken Exam Data Set 2 * Translation of Japanese Subtask Center Shiken Exam Data Set. |
ready to use | Topic: 36(2007),41(2003) |
2003,2007 | ||||
Second-stage University Entrance Exam Data*: a. questions: Second-stage University Entrance Exam Data Set 1 b. answers: Second-stage University Entrance Exam Data Set 2 * Translation of Japanese Subtask Center Shiken Exam Data Set. |
ready to use* * To be announced: Second-stage University Entrance Exam Data Set2 |
To be announced | 2007 | |||||||||
Task Data for system training purposes | Center Shiken Exam Data (world_history_B)*: a. Sample Questions * Translation of Japanese Subtask Center Shiken Exam Data Set. |
ready to use | Topic: 40(1997),41(2001), 36(2005),36(2009) |
1997,2001,2005,2009 | ||||||||
Second-stage University Entrance Exam Data*: a. Sample Questions * Translation of Japanese Subtask Center Shiken Exam Data Set. |
ready to use | To be announced | 2005,2009 | |||||||||
Document Data | Japanese Subtask | J | Wikipedia Corpus: a. Wikipedia Data: Wikipedia Indri indexed Dataset1 b. Indexed Data: Wikipedia Indri indexed Dataset2,3 |
ready to use: Open Access |
1.17 GB | - | ||||||
Japanese Textbook Corpus1 - World History Subset: a. Textbook Data: Tokyo Shoseki Textbook Data Set 0 b. annotations: Tokyo Shoseki Textbook Data Set 1 c. Indexed Data: Tokyo Shoseki Textbook Data Set 2, 3 |
ready to use | 570 KB | 2007,2008 | |||||||||
Japanese Textbook Corpus2 - World History Subset: a. Textbook Data: Yamakawa Shuppansha Textbook Data Set 0 b. annotations: Yamakawa Shuppansha Textbook Data Set 1 c. Indexed Data: Yamakawa Shuppansha Textbook Data Set 2, 3 |
ready to use*: * To be announced: Yamakawa Shuppansha Textbook Data Set 0 |
2010 | ||||||||||
Task Data | Center Shiken Exam Data (world_history_B): a. questions: Center Shiken Exam Data Set 1 b. answers: Center Shiken Exam Data Set 2 |
ready to use | Topic: 36(2007),41(2003) |
2003,2007 | ||||||||
Second-stage University Entrance Exam Data: a. questions: Second-stage University Entrance Exam Data Set 1 (questions) b. answers: Second-stage University Entrance Exam Data Set 2 (answers) |
ready to use: * To be announced: Second-stage University Entrance Exam Data Set2 |
To be announced | 2007 | |||||||||
Task Data for system training purposes | Center Shiken Exam Data (world_history_B): a. Sample Questions |
ready to use | Topic: 40(1997),41(2001), 36(2005),36(2009) |
1997,2001,2005,2009 | ||||||||
Second-stage University Entrance Exam Data: a. Sample Questions |
ready to use | To be announced | 2005,2009 | |||||||||
Temporalia | Document Data | Web (News) | E | LivingKnowledge news and blogs annotated subcollection | ready to use **c |
ca. 3.8M docs (ca. 20GB) | 2011-2013 | |||||
Task Data | TQIC Subtask / Classification | NTCIR-11 Temporalia Task Data | May 9, 2014 | 300 queries | 2014 | |||||||
TIR Subtask / Retrieval | 50 Topics | |||||||||||
Task Data for system training purposes | TQIC Subtask / Classification | NTCIR-11 Temporalia Task Data | Jan 25, 2014 | 100 queries | ||||||||
TIR Subtask / Retrieval | 15 Topics | |||||||||||
RecipeSearch | Document Data | Cooking Recipe | E | Yummly Recipe Data **f | ready to use **g | recipe information for 100,000 recipes (33,605,459 bytes) | - | |||||
Cooking Recipe | J | Rakuten Recipe **d | ready to use **e | recipe information for 440,000 recipes (158,321,432 bytes) | - | |||||||
Task Data | * To be announced soon. * | E | ||||||||||
J |
Last Modified: 2014-07-28