SHINRA is a resource creation project aiming to structure the knowledge in Wikipedia. SHINRA2020-ML, conducted as one of the NTCIR-15 tasks, is the first shared-task of text classification in project SHINRA, tackling the challenge of classifying 30 language Wikipedia entities in fine-grained categories.
The participants are expected to select one or more target languages, and for each language, use the Wikipedia pages linked from the categorized Japanese pages as the training data, and run the system to classify the remaining pages which are not linked from the Japanese pages.
Please see the following for further details of the task.
- (SHINRA2020-ML website): SHINRA2020-ML
- (SHINRA2020-ML Overview Paper): [PDF]
Satoshi Sekine, Masako Nomoto, Kouta Nakayama, Asuka Sumida, Koji Matsuda, and Maya Ando. 2020.
Overview of SHINRA2020-ML Task. In Proceedings of the NTCIR-15 Conference.
The NTCIR-15 SHINRA2020-ML test collection consists of the following:
- Minimal datasets
- Training Data
- Target Data
- Additional datasets
- (a)Japanese Wikipedia articles classified into Extended Named Entity Categories
- (b)Language Link information between Wikipedia of different languages
- (c)Script to build the training data using (a) and (b)
- (d)Wikipedia dump data in 31 languages
- (e)Extended Named Entity Definition
- Test data is not included in the test collection.
- Evaluation results and participant runs are available from NTCIR-15: SHINRA2020-ML System Data Download
You can download the test collection from SHINRA2020-ML: Data Download site.
- SHINRA2020-ML Task Overview : Overview of SHINRA2020-ML Task [PDF]
- SHINRA2020-ML website : SHINRA2020-ML
Contact us: ntc-secretariat