Inter-Rater Agreement for Social Computing Studies

Joni O. Salminen, Hind A. Al-Merekhi, Partha Dey, Bernard Jansen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Different agreement scores are widely used in social computing studies to evaluate the reliability of crowdsourced ratings. In this research, we argue that the concept of agreement is problematic for many rating tasks in computational social science because they are characterized by subjectivity. We demonstrate this claim by analyzing four social computing datasets that are rated by crowd workers, showing that the agreement ratings are low despite deploying proper instructions and platform settings. Findings indicate that the more subjective the rating task, the lower the agreement, suggesting that tasks differ by their inherent subjectivity and that measuring the agreement of social computing tasks might not be the optimal way to ensure data quality. When creating sbjective tasks, the use of agreement metrics potentially gives a false picture of the consistency of crowd workers, as they over-simplify the reality of obtaining quality labels. We also provide empirical evidence on the stability of crowd ratings with a different number of raters, items, and categories, finding that the reliability s cores a re most sensitive to the number categories, somewhat less sensitive to the number of raters, and the least sensitive to the number of items. Our findings have implications for computational social scientists using crowdsourcing for data collection.

Original languageEnglish
Title of host publication2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages80-87
Number of pages8
ISBN (Electronic)9781538695883
DOIs
Publication statusPublished - 30 Nov 2018
Event5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018 - Valencia, Spain
Duration: 15 Oct 201818 Oct 2018

Other

Other5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018
CountrySpain
CityValencia
Period15/10/1818/10/18

Fingerprint

social studies
rating
Social sciences
Labels
subjectivity
worker
data quality
social scientist
social science
Rating
Interrater agreement
instruction
evidence
Subjectivity
Workers

Keywords

  • crowd evaluations
  • crowd ratings
  • crowdsourcing
  • inter-rater reliability
  • social computing

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Communication

Cite this

Salminen, J. O., Al-Merekhi, H. A., Dey, P., & Jansen, B. (2018). Inter-Rater Agreement for Social Computing Studies. In 2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018 (pp. 80-87). [8554744] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SNAMS.2018.8554744

Inter-Rater Agreement for Social Computing Studies. / Salminen, Joni O.; Al-Merekhi, Hind A.; Dey, Partha; Jansen, Bernard.

2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 80-87 8554744.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Salminen, JO, Al-Merekhi, HA, Dey, P & Jansen, B 2018, Inter-Rater Agreement for Social Computing Studies. in 2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018., 8554744, Institute of Electrical and Electronics Engineers Inc., pp. 80-87, 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018, Valencia, Spain, 15/10/18. https://doi.org/10.1109/SNAMS.2018.8554744
Salminen JO, Al-Merekhi HA, Dey P, Jansen B. Inter-Rater Agreement for Social Computing Studies. In 2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 80-87. 8554744 https://doi.org/10.1109/SNAMS.2018.8554744
Salminen, Joni O. ; Al-Merekhi, Hind A. ; Dey, Partha ; Jansen, Bernard. / Inter-Rater Agreement for Social Computing Studies. 2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 80-87
@inproceedings{7252e3cd7ae74ef8afaa0e4c4e4a3baf,
title = "Inter-Rater Agreement for Social Computing Studies",
abstract = "Different agreement scores are widely used in social computing studies to evaluate the reliability of crowdsourced ratings. In this research, we argue that the concept of agreement is problematic for many rating tasks in computational social science because they are characterized by subjectivity. We demonstrate this claim by analyzing four social computing datasets that are rated by crowd workers, showing that the agreement ratings are low despite deploying proper instructions and platform settings. Findings indicate that the more subjective the rating task, the lower the agreement, suggesting that tasks differ by their inherent subjectivity and that measuring the agreement of social computing tasks might not be the optimal way to ensure data quality. When creating sbjective tasks, the use of agreement metrics potentially gives a false picture of the consistency of crowd workers, as they over-simplify the reality of obtaining quality labels. We also provide empirical evidence on the stability of crowd ratings with a different number of raters, items, and categories, finding that the reliability s cores a re most sensitive to the number categories, somewhat less sensitive to the number of raters, and the least sensitive to the number of items. Our findings have implications for computational social scientists using crowdsourcing for data collection.",
keywords = "crowd evaluations, crowd ratings, crowdsourcing, inter-rater reliability, social computing",
author = "Salminen, {Joni O.} and Al-Merekhi, {Hind A.} and Partha Dey and Bernard Jansen",
year = "2018",
month = "11",
day = "30",
doi = "10.1109/SNAMS.2018.8554744",
language = "English",
pages = "80--87",
booktitle = "2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Inter-Rater Agreement for Social Computing Studies

AU - Salminen, Joni O.

AU - Al-Merekhi, Hind A.

AU - Dey, Partha

AU - Jansen, Bernard

PY - 2018/11/30

Y1 - 2018/11/30

N2 - Different agreement scores are widely used in social computing studies to evaluate the reliability of crowdsourced ratings. In this research, we argue that the concept of agreement is problematic for many rating tasks in computational social science because they are characterized by subjectivity. We demonstrate this claim by analyzing four social computing datasets that are rated by crowd workers, showing that the agreement ratings are low despite deploying proper instructions and platform settings. Findings indicate that the more subjective the rating task, the lower the agreement, suggesting that tasks differ by their inherent subjectivity and that measuring the agreement of social computing tasks might not be the optimal way to ensure data quality. When creating sbjective tasks, the use of agreement metrics potentially gives a false picture of the consistency of crowd workers, as they over-simplify the reality of obtaining quality labels. We also provide empirical evidence on the stability of crowd ratings with a different number of raters, items, and categories, finding that the reliability s cores a re most sensitive to the number categories, somewhat less sensitive to the number of raters, and the least sensitive to the number of items. Our findings have implications for computational social scientists using crowdsourcing for data collection.

AB - Different agreement scores are widely used in social computing studies to evaluate the reliability of crowdsourced ratings. In this research, we argue that the concept of agreement is problematic for many rating tasks in computational social science because they are characterized by subjectivity. We demonstrate this claim by analyzing four social computing datasets that are rated by crowd workers, showing that the agreement ratings are low despite deploying proper instructions and platform settings. Findings indicate that the more subjective the rating task, the lower the agreement, suggesting that tasks differ by their inherent subjectivity and that measuring the agreement of social computing tasks might not be the optimal way to ensure data quality. When creating sbjective tasks, the use of agreement metrics potentially gives a false picture of the consistency of crowd workers, as they over-simplify the reality of obtaining quality labels. We also provide empirical evidence on the stability of crowd ratings with a different number of raters, items, and categories, finding that the reliability s cores a re most sensitive to the number categories, somewhat less sensitive to the number of raters, and the least sensitive to the number of items. Our findings have implications for computational social scientists using crowdsourcing for data collection.

KW - crowd evaluations

KW - crowd ratings

KW - crowdsourcing

KW - inter-rater reliability

KW - social computing

UR - http://www.scopus.com/inward/record.url?scp=85060064485&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060064485&partnerID=8YFLogxK

U2 - 10.1109/SNAMS.2018.8554744

DO - 10.1109/SNAMS.2018.8554744

M3 - Conference contribution

SP - 80

EP - 87

BT - 2018 5th International Conference on Social Networks Analysis, Management and Security, SNAMS 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -