BOOTS is a toolkit for conducting pairwise bootstrap tests for a given set of systems, and for computing the discriminative power. For details, please refer to the README file included in the tar file and the paper(s) listed in the References section.
Note that while BOOTS conducts a significance test for every system pair independently, my Discpower toolkit conducts a randomised Tukey HSD test for all system pairs at the same time, which is more appropriate.
http://research.nii.ac.jp/ntcir/tools/Boots160507.tar.gz (README, 2016-05-07)
l Metrics, Statistics, Tests, Sakai, T., PROMISE Winter School 2013: Bridging between Information Retrieval and Databases (LNCS 8173), 2014.
l Evaluating Information Retrieval Metrics based on Bootstrap Hypothesis Tests, Sakai, T., IPSJ TOD, Vol.48, No.SIG 9 (TOD35), pp.11-28, 2007.
l Evaluating Evaluation Metrics based on the Bootstrap, Sakai, T., ACM SIGIR 2006 Proceedings, pp.525-532, August 2006.
Updated on : 2016-05-07