!/usr/bin/perl NTCIR-3 QAC Task: Scoring Tool Ver. 1.50 (2002. 8. 30.) Copyright. QAC Task Committee. 2002. R. Nagata. 2002. F. Masui. 2002. SUMMARY OF SCORING TOOL 'scoring.pl' is a program for scoring answers to the questions used in NTCIR-3 QAC Task. The program can be used for TASK1, TASK2, and TASK3. The program checks whether the answers the user's system output are correct or not, scores the answers, and shows the result and statistics. USAGE scoring [OPTIONS] FILENAME FILENAME: a filename of the user's system output OPTIONS: --answer|-a filename Specifies the filename of the correct answer set. The character code in the file needs to be same as the one used in the user's system output. --help|-h Shows help. --version|-v Shows version of the program. --task|-t number Selects tasks. A number, 1, 2, or 3 follows this option. 1, 2, 3 are for TASK1, TASK2, TASK3, respectively. --extract|-e number Shows the inner data. A number, 1, 2, 3, or 4, follows this option. 1 shows information on each question including, question ID, the total number of answers, the number of different answers, answer number, answer, article ID. ex. QAC1-1034-01 3 2 1 3500 metre 991208045 1 3500 metre 980717035 2 1200 mtre 990904183 2 shows the answers that the user's system output including, question ID, the number of answers, answer number, answer, article ID. ex. QAC1-1019-01 5 0 March 990807065 1 April 990807065 2 May 990807065 3 June 980219267 4 July 990807065 3 shows information in detail on correct answers that user's system output. The information includes the correct answers and the answer numbers that correspond to the answer numbers in the correct answer set. The symbol '-' that precedes the answer number means that the answer is correct, but the article ID might not be correct. ex. March | -1 March | -1 April | 2 4 shows the score given to each questoin in Task2. The option is valid for only Task2. Question ID, the number of correct answers, the number of answers that user's system output, the number of correct answers that user's system output and F-measure score for each answers that user's system output. ex. QAC1-2146-01: 1 5 1 0.333333 5 shows the result of answer checking. Question ID, question, list of correct answer and whole answers that user's system output. The correct answer that user's system output are marked with asterisk. The option is valid for only Task1. ex. QAC1-1100-01 "Where is Mie University?" CORRECT ANSWER: Mie Tsu Kamihama-cho Edobashi Mie * Ise-Shima Tsu station Department of Information Engineering Tsu * DESCRIPTION INPUT: 'scoring.pl' accepts text files that agrees with the format the QAC committee regulates. Any data starting with ``#'' in the file are ignored as comments. OUTPUT: 'scoring.pl' outputs the results of scoring to the standard output. The results include the marks and the average score. ex. Task1 Results: 35.0 marks out of 200.0 in TASK1 Average score: 0.175 The first number in the first line is the score the user's system has got. The second number in the fist line is the full marks. The number in second line is the average score, that is, computed by the score / the number of questions The scores for each Task are computed as follows: Task1: Check if the answers in each question are correct or not. Calculate the reciprocal number(RR) of the highest order of the correct answers. Sum up the reciprocal numbers. Task2: Check if the answers in each question are correct or not. Calculate F-mesure for each question Sum up the F-mesures. Task3: Check if the answers in only branch-questions are correct or not. Calculate F-mesure for the branch-questions Sum up the F-mesures. 'scoring.pl' outputs a brief summary of scoring to the standard output. ex. ---------------------------------------------------------- Question Answer Output Correct ---------------------------------------------------------- 200 272 729 38 ---------------------------------------------------------- Question: the total number of questions in the task Answer: the number of different answers in the task Output: the number of answers that the user's system output Correct: the number of correct answers that the user's system output 'scoring.pl' also outputs statistics to the standard output. ex. ---------------------------------------------------------- Recall Precision F-measure MRR/AFM ---------------------------------------------------------- 0.139 0.521 0.759 0.175 ---------------------------------------------------------- Each value is computed as follows: Recall = {(the number of correct answers that the user's system output)/(the number of correct answers)} Precision = {(the number of correct answers that the user's system output)/(the number of answers that the user's system output)} F-measure = (2 * Recall * Precision)/(Recall + Precision) Mean Reciprocal Rank(MRR) = (sum of RR) / (the number of questions) RR: see the computations for TASK1 in OUTPUT(line 121) MRR is valid for TASK1 'scoring.pl' outputs the result of answer checking into the file 'res.dat'. 'res.dat' is generated after scoring. 'res.dat' shows whether the answers are correct or not. If your answer is correct, ``E' goes with the answer, otherwise, "¡ß(batsu)". ex. QAC1-1020-01: India ¡û(maru) Indonesia ¡û(maru) Thai ¡ß(batsu), US ¡ß(batsu), Franc ¡ß(batsu) A "¦Õ(phi)" (a symbol for phi) in the file means that the system output no answer in the question. In this case, "¡û(maru)" is given if and only if there is no answer in the correct answer set. ex. QAC1-1021-01: ¦Õ(phi) ¡û(maru) EOF