Evaluation results for the teams who submitted predictions for the NAILS pilot task are below.
BA = Balanced Accuracy

QUT (2 submissions):
BA: 0.82810 	Timestamp: 1506039026.829094
BA: 0.85281 	Timestamp: 1506039013.06217

Best scoring approach for QUT had a BA=0.8528


ARL17 (5 submissions):
BA: 0.88386 	Timestamp: 1505857403.6826353
BA: 0.87243 	Timestamp: 1505314319.1925228
BA: 0.85257 	Timestamp: 1505222934.571628
BA: 0.84591 	Timestamp: 1505154204.2565637
BA: 0.77234 	Timestamp: 1503679734.9772093

Best scoring approach for ARL17 had a BA=0.8839


Accuracy in the NAILS pilot task was measured using balanced accuracy on the withheld test set. 
The highest prediction accuracy of the prediction submission runs for a team was used to rank 
participating teams. The test set comprised of 18000 feature vectors that needed to be labelled as 
either a target (900 entries where label=1) or a standard (17100 entries where label=-1). Details of 
how these accuracies break down across experimental participants (and similar) are available in the 
overview paper.