Best-Worst Scaling aka Maximum Difference Scaling (MaxDiff)


Saif M. Mohammad (
Svetlana Kiritchenko (

Obtaining real-valued annotations has several challenges. Respondents are faced with a higher cognitive load when asked for real-valued scores as opposed to simply classifying terms into pre-chosen disrete classes. Besides, it is difficult for an annotator to remain consistent with his/her annotations. Further, the same score may map to different sentiment scores in the minds of different annotators. One could overcome these problems by providing annotators with pairs of terms and asking which is stronger in terms of association with the property of interest (a comparative approach); however, that requires a much larger set of annotations (order NxN, where N is the num ber of instances to be annotated).

Best–Worst Scaling (BWS), also sometimes referred to as Maximum Difference Scaling (MaxDiff), is an annotation scheme that exploits the comparative approach to annotation (Louviere and Woodworth, 1990; Cohen, 2003; Louviere et al., 2015). Annotators are given four items (4-tuple) and asked which item is the Best (highest in terms of the property of interest) and which is the Worst (least in terms of the property of interest). These annotations can then be easily converted into real-valued scores of association between the items and the property, which eventually allows for creating a ranked list of items as per their association with the property of interest.


We used best-worst scaling to contruct the following datasets:

We show that ranking of terms by sentiment remains remarkably consistent even when the annotation process is repeated with a different set of annotators. We also, for the first time, determine the minimum difference in sentiment association that is perceptible to native speakers of a language.

Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best-Worst Scaling. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. June 2016. San Diego, CA.
Paper (pdf)   BibTeX    Presentation   

Code (last updated September 30, 2016): to assist with best-worst-scaling annotations can be downloaded by clicking here. It includes scripts to produce 4-tuples with desired term distributions and to produce real-valued scores from best-worst annotations.


Last updated: September 2016.