Assessing the bias in samples of large online networks

Sandra González-Bailón, Ning Wang, Alejandro Rivero, Javier Borge-Holthoefer, Yamir Moreno

Research output: Contribution to journalArticle

79 Citations (Scopus)

Abstract

We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication.

Original languageEnglish
Pages (from-to)16-27
Number of pages12
JournalSocial Networks
Volume38
Issue number1
DOIs
Publication statusPublished - 1 Jul 2014

Fingerprint

Social Media
Selection Bias
protest
trend
programming
political communication
communication
twitter
social media

Keywords

  • Graph comparison
  • Measurement error
  • Political communication
  • Social media
  • Social protests
  • Twitter

ASJC Scopus subject areas

  • Sociology and Political Science
  • Social Sciences(all)
  • Anthropology
  • Psychology(all)

Cite this

González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. (2014). Assessing the bias in samples of large online networks. Social Networks, 38(1), 16-27. https://doi.org/10.1016/j.socnet.2014.01.004

Assessing the bias in samples of large online networks. / González-Bailón, Sandra; Wang, Ning; Rivero, Alejandro; Borge-Holthoefer, Javier; Moreno, Yamir.

In: Social Networks, Vol. 38, No. 1, 01.07.2014, p. 16-27.

Research output: Contribution to journalArticle

González-Bailón, S, Wang, N, Rivero, A, Borge-Holthoefer, J & Moreno, Y 2014, 'Assessing the bias in samples of large online networks', Social Networks, vol. 38, no. 1, pp. 16-27. https://doi.org/10.1016/j.socnet.2014.01.004
González-Bailón S, Wang N, Rivero A, Borge-Holthoefer J, Moreno Y. Assessing the bias in samples of large online networks. Social Networks. 2014 Jul 1;38(1):16-27. https://doi.org/10.1016/j.socnet.2014.01.004
González-Bailón, Sandra ; Wang, Ning ; Rivero, Alejandro ; Borge-Holthoefer, Javier ; Moreno, Yamir. / Assessing the bias in samples of large online networks. In: Social Networks. 2014 ; Vol. 38, No. 1. pp. 16-27.
@article{a95415d156dc4f94be65218a991793c9,
title = "Assessing the bias in samples of large online networks",
abstract = "We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication.",
keywords = "Graph comparison, Measurement error, Political communication, Social media, Social protests, Twitter",
author = "Sandra Gonz{\'a}lez-Bail{\'o}n and Ning Wang and Alejandro Rivero and Javier Borge-Holthoefer and Yamir Moreno",
year = "2014",
month = "7",
day = "1",
doi = "10.1016/j.socnet.2014.01.004",
language = "English",
volume = "38",
pages = "16--27",
journal = "Social Networks",
issn = "0378-8733",
publisher = "Elsevier BV",
number = "1",

}

TY - JOUR

T1 - Assessing the bias in samples of large online networks

AU - González-Bailón, Sandra

AU - Wang, Ning

AU - Rivero, Alejandro

AU - Borge-Holthoefer, Javier

AU - Moreno, Yamir

PY - 2014/7/1

Y1 - 2014/7/1

N2 - We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication.

AB - We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication.

KW - Graph comparison

KW - Measurement error

KW - Political communication

KW - Social media

KW - Social protests

KW - Twitter

UR - http://www.scopus.com/inward/record.url?scp=84897705730&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897705730&partnerID=8YFLogxK

U2 - 10.1016/j.socnet.2014.01.004

DO - 10.1016/j.socnet.2014.01.004

M3 - Article

VL - 38

SP - 16

EP - 27

JO - Social Networks

JF - Social Networks

SN - 0378-8733

IS - 1

ER -