Detecting demographic bias in automatically generated personas

Joni Salminen, Bernard Jansen, Jung Soongyo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.

Original languageEnglish
Title of host publicationCHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450359719
DOIs
Publication statusPublished - 2 May 2019
Event2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019 - Glasgow, United Kingdom
Duration: 4 May 20199 May 2019

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019
CountryUnited Kingdom
CityGlasgow
Period4/5/199/5/19

Keywords

  • Algorithmic Bias
  • Automatic Persona Generation
  • Data-Driven Personas

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Salminen, J., Jansen, B., & Soongyo, J. (2019). Detecting demographic bias in automatically generated personas. In CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems [3313034] (Conference on Human Factors in Computing Systems - Proceedings). Association for Computing Machinery. https://doi.org/10.1145/3290607.3313034

Detecting demographic bias in automatically generated personas. / Salminen, Joni; Jansen, Bernard; Soongyo, Jung.

CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2019. 3313034 (Conference on Human Factors in Computing Systems - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Salminen, J, Jansen, B & Soongyo, J 2019, Detecting demographic bias in automatically generated personas. in CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems., 3313034, Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, 2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019, Glasgow, United Kingdom, 4/5/19. https://doi.org/10.1145/3290607.3313034
Salminen J, Jansen B, Soongyo J. Detecting demographic bias in automatically generated personas. In CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery. 2019. 3313034. (Conference on Human Factors in Computing Systems - Proceedings). https://doi.org/10.1145/3290607.3313034
Salminen, Joni ; Jansen, Bernard ; Soongyo, Jung. / Detecting demographic bias in automatically generated personas. CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2019. (Conference on Human Factors in Computing Systems - Proceedings).
@inproceedings{f873e5a7afc54786aac38258d2c35ed0,
title = "Detecting demographic bias in automatically generated personas",
abstract = "We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.",
keywords = "Algorithmic Bias, Automatic Persona Generation, Data-Driven Personas",
author = "Joni Salminen and Bernard Jansen and Jung Soongyo",
year = "2019",
month = "5",
day = "2",
doi = "10.1145/3290607.3313034",
language = "English",
series = "Conference on Human Factors in Computing Systems - Proceedings",
publisher = "Association for Computing Machinery",
booktitle = "CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems",

}

TY - GEN

T1 - Detecting demographic bias in automatically generated personas

AU - Salminen, Joni

AU - Jansen, Bernard

AU - Soongyo, Jung

PY - 2019/5/2

Y1 - 2019/5/2

N2 - We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.

AB - We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.

KW - Algorithmic Bias

KW - Automatic Persona Generation

KW - Data-Driven Personas

UR - http://www.scopus.com/inward/record.url?scp=85067280102&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067280102&partnerID=8YFLogxK

U2 - 10.1145/3290607.3313034

DO - 10.1145/3290607.3313034

M3 - Conference contribution

T3 - Conference on Human Factors in Computing Systems - Proceedings

BT - CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

PB - Association for Computing Machinery

ER -