!/usr/bin/perl
 NTCIR-3 QAC Task: Scoring Tool Ver. 1.50 (2002. 8. 30.) 
 
   Copyright. QAC Task Committee. 2002. 
                                        R. Nagata. 2002. 
                                        F. Masui.  2002. 
 
 
SUMMARY OF SCORING TOOL 
	 
  'scoring.pl' is a program for scoring answers to the questions used 
   in NTCIR-3 QAC Task. The program can be used for TASK1, TASK2, and TASK3. 
  The program checks whether the answers the user's system output are correct or 
  not, scores the answers, and shows the result and statistics. 
	 
	 
USAGE 
	 
	scoring [OPTIONS] FILENAME 
	 
	FILENAME: a filename of the user's system output 
	 
	OPTIONS: 
		 
	   --answer|-a filename 
 
                Specifies the filename of the correct answer 
                set.  The character code in the file needs to be  
		same as the one used in the user's system output. 
 
		 
	   --help|-h 
 
	         Shows help. 
	 
	 
	   --version|-v 
 
	         Shows version of the program. 
	 
		 
	   --task|-t number 
 
                Selects tasks. A number, 1, 2, or  3 follows this  
                option. 1, 2, 3 are for TASK1, TASK2, TASK3,  
                respectively. 
	 
	 
	   --extract|-e number 
 
                Shows the inner data. A number, 1, 2, 3, or 4,  
                follows this option. 
			 
                1 shows information on each question including, 
                question ID, the total number of answers, the number 
                of different answers, answer number, answer, article 
                ID. 
	 
			ex.	QAC1-1034-01 3 2 
				1 3500 metre 991208045 
				1 3500 metre 980717035 
				2 1200 mtre 990904183 
			 
                2 shows the answers that the user's system output 
                including, question ID, the number of answers, answer 
                number, answer, article ID. 
 
			ex.	QAC1-1019-01 5 
				0 March 990807065 
				1 April 990807065 
				2 May 990807065 
				3 June 980219267 
				4 July 990807065 
	 
		 3 shows information in detail on correct answers that 
		 user's system output. The information includes the 
		 correct answers and the answer numbers that correspond to 
		 the answer numbers in the correct answer set. The 
		 symbol '-' that precedes the answer number means that the 
		 answer is correct, but the article ID might not be 
		 correct. 
	 
			ex.	March | -1 
				March | -1 
				April | 2 
			  
               4 shows the score given to each questoin in Task2.  
               The option is valid for only Task2. 
               Question ID, the number of correct answers, the number of 
               answers that user's system output, the number of correct 
               answers that user's system output and F-measure score
               for each answers that user's system output. 
               
			ex. QAC1-2146-01: 1 5 1 0.333333 
                
               5 shows the result of answer checking. Question ID, 
               question, list of correct answer and whole answers 
               that user's system output.  The correct answer 
               that user's system output are marked with asterisk.
               The option is valid for only Task1. 

                       ex. 
               QAC1-1100-01  "Where is Mie University?"
               CORRECT ANSWER: Mie Tsu Kamihama-cho Edobashi
               Mie  * 
               Ise-Shima  
               Tsu station  
               Department of Information Engineering 
               Tsu  * 
 
 
 DESCRIPTION 
		 
		 
	INPUT: 
		'scoring.pl' accepts text files that agrees with the 
		format the QAC committee regulates. Any data starting 
		with ``#'' in the file are ignored as comments. 
	 
	 
	OUTPUT:  
		'scoring.pl' outputs the results of scoring to the 
		standard output. The results include the marks and 
		the average score. 

	        ex. 
               Task1 Results: 35.0 marks out of 200.0 in TASK1 
	       Average score: 0.175 
 
		The first number in the first line is the score the 
       user's system has got. The second number in the fist
       line is the full marks. The number in second line is
       the average score, that is, computed by 
        the score / the number of questions

       The scores for each Task are computed as follows:

       Task1: Check if the answers in each question are correct
              or not.
              Calculate the reciprocal number(RR) of the highest
              order of the correct answers.
              Sum up the reciprocal numbers.

       Task2: Check if the answers in each question are correct
              or not.
              Calculate F-mesure for each question
              Sum up the F-mesures.

       Task3: Check if the answers in only branch-questions are
              correct or not.
              Calculate F-mesure for the branch-questions
              Sum up the F-mesures.


		'scoring.pl' outputs a brief summary of scoring to the 
		standard output. 
	 
		ex. 
		---------------------------------------------------------- 
		Question         Answer         Output          Correct 
		---------------------------------------------------------- 
  		200             272             729              38 
		---------------------------------------------------------- 
 
              Question: the total number of questions in the task 
              Answer:   the number of different answers in the task 
              Output:   the number of answers that the user's system 
		        output 
              Correct:  the number of correct answers that the user's 
		        system output 
	 
	 
		'scoring.pl' also outputs statistics to the standard 
		output. 
	 
	        ex. 
		---------------------------------------------------------- 
		Recall          Precision      F-measure       MRR/AFM
		---------------------------------------------------------- 
		0.139             0.521           0.759         0.175 
		---------------------------------------------------------- 
 
	 
		Each value is computed as follows: 
	 
		Recall = {(the number of correct answers that the user's system output)/(the number of correct answers)} 
		 
		Precision = {(the number of correct answers that the user's system output)/(the number of answers that the user's system output)} 
	 
		F-measure = (2 * Recall * Precision)/(Recall + Precision) 
	 
   	        Mean Reciprocal Rank(MRR) = (sum of RR) / (the number of questions)

            RR: see the computations for TASK1 in OUTPUT(line 121)
            MRR is valid for TASK1
	 
			 
               'scoring.pl' outputs the result of answer checking into 
               the file 'res.dat'. 'res.dat' is generated after 
               scoring. 
	        'res.dat' shows whether the answers are correct or 
	        not. 
		If your answer is correct, ``E' goes with the 
		answer, otherwise, "ЎЯ(batsu)". 
	 
		ex.   
		QAC1-1020-01: India Ўы(maru) Indonesia Ўы(maru) Thai ЎЯ(batsu), US ЎЯ(batsu), Franc ЎЯ(batsu) 				 
 
             A "¦Х(phi)" (a symbol for phi) in the file means that the 
             system output no answer in the question.  In this case, 
             "Ўы(maru)" is given if and only if there is no answer in 
             the correct answer set.
          
	      ex. 
             QAC1-1021-01: ¦Х(phi) Ўы(maru)
                
EOF