Estimating group properties in online social networks with a classifier

George Berry, Antonio Sirianni, Nathan High, Agrippa Kellum, Ingmar Weber, Michael Macy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: The network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a three step procedure which entails: (1) walking the graph starting from an arbitrary node; (2) learning a classifier on the nodes in the walk; and (3) applying a post-hoc adjustment to classification labels. The walk step provides the information necessary to make inferences over the nodes and edges, while the adjustment step corrects for classifier bias in estimating group proportions. This process provides de-biased estimates at the cost of additional variance. We evaluate AdjustedWalk on four tasks: The proportion of nodes belonging to a minority group, the proportion of the minority group among high degree nodes, the proportion of within-group edges, and Coleman’s homophily index. Simulated and empirical graphs show that this procedure performs well compared to optimal baselines in a variety of circumstances, while indicating that variance increases can be large for low-recall classifiers.

Original languageEnglish
Title of host publicationSocial Informatics - 10th International Conference, SocInfo 2018, Proceedings
EditorsOlessia Koltsova, Dmitry I. Ignatov, Steffen Staab
PublisherSpringer Verlag
Pages67-85
Number of pages19
ISBN (Print)9783030011284
DOIs
Publication statusPublished - 1 Jan 2018
Event10th Conference on Social Informatics, SocInfo 2018 - Saint-Petersburg, Russian Federation
Duration: 25 Sep 201828 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11185 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th Conference on Social Informatics, SocInfo 2018
CountryRussian Federation
CitySaint-Petersburg
Period25/9/1828/9/18

Fingerprint

Social Networks
Classifiers
Classifier
Proportion
Vertex of a graph
Labels
Walk
Biased
Adjustment
Estimate
Evaluate
Graph in graph theory
Baseline
High Performance
Necessary
Arbitrary

Keywords

  • Classification error
  • Digital demography
  • Network sampling
  • Quantification learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Berry, G., Sirianni, A., High, N., Kellum, A., Weber, I., & Macy, M. (2018). Estimating group properties in online social networks with a classifier. In O. Koltsova, D. I. Ignatov, & S. Staab (Eds.), Social Informatics - 10th International Conference, SocInfo 2018, Proceedings (pp. 67-85). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11185 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01129-1_5

Estimating group properties in online social networks with a classifier. / Berry, George; Sirianni, Antonio; High, Nathan; Kellum, Agrippa; Weber, Ingmar; Macy, Michael.

Social Informatics - 10th International Conference, SocInfo 2018, Proceedings. ed. / Olessia Koltsova; Dmitry I. Ignatov; Steffen Staab. Springer Verlag, 2018. p. 67-85 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11185 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Berry, G, Sirianni, A, High, N, Kellum, A, Weber, I & Macy, M 2018, Estimating group properties in online social networks with a classifier. in O Koltsova, DI Ignatov & S Staab (eds), Social Informatics - 10th International Conference, SocInfo 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11185 LNCS, Springer Verlag, pp. 67-85, 10th Conference on Social Informatics, SocInfo 2018, Saint-Petersburg, Russian Federation, 25/9/18. https://doi.org/10.1007/978-3-030-01129-1_5
Berry G, Sirianni A, High N, Kellum A, Weber I, Macy M. Estimating group properties in online social networks with a classifier. In Koltsova O, Ignatov DI, Staab S, editors, Social Informatics - 10th International Conference, SocInfo 2018, Proceedings. Springer Verlag. 2018. p. 67-85. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-01129-1_5
Berry, George ; Sirianni, Antonio ; High, Nathan ; Kellum, Agrippa ; Weber, Ingmar ; Macy, Michael. / Estimating group properties in online social networks with a classifier. Social Informatics - 10th International Conference, SocInfo 2018, Proceedings. editor / Olessia Koltsova ; Dmitry I. Ignatov ; Steffen Staab. Springer Verlag, 2018. pp. 67-85 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{76725ca8c400456388bf458bc3b64949,
title = "Estimating group properties in online social networks with a classifier",
abstract = "We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: The network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a three step procedure which entails: (1) walking the graph starting from an arbitrary node; (2) learning a classifier on the nodes in the walk; and (3) applying a post-hoc adjustment to classification labels. The walk step provides the information necessary to make inferences over the nodes and edges, while the adjustment step corrects for classifier bias in estimating group proportions. This process provides de-biased estimates at the cost of additional variance. We evaluate AdjustedWalk on four tasks: The proportion of nodes belonging to a minority group, the proportion of the minority group among high degree nodes, the proportion of within-group edges, and Coleman’s homophily index. Simulated and empirical graphs show that this procedure performs well compared to optimal baselines in a variety of circumstances, while indicating that variance increases can be large for low-recall classifiers.",
keywords = "Classification error, Digital demography, Network sampling, Quantification learning",
author = "George Berry and Antonio Sirianni and Nathan High and Agrippa Kellum and Ingmar Weber and Michael Macy",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-01129-1_5",
language = "English",
isbn = "9783030011284",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "67--85",
editor = "Olessia Koltsova and Ignatov, {Dmitry I.} and Steffen Staab",
booktitle = "Social Informatics - 10th International Conference, SocInfo 2018, Proceedings",

}

TY - GEN

T1 - Estimating group properties in online social networks with a classifier

AU - Berry, George

AU - Sirianni, Antonio

AU - High, Nathan

AU - Kellum, Agrippa

AU - Weber, Ingmar

AU - Macy, Michael

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: The network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a three step procedure which entails: (1) walking the graph starting from an arbitrary node; (2) learning a classifier on the nodes in the walk; and (3) applying a post-hoc adjustment to classification labels. The walk step provides the information necessary to make inferences over the nodes and edges, while the adjustment step corrects for classifier bias in estimating group proportions. This process provides de-biased estimates at the cost of additional variance. We evaluate AdjustedWalk on four tasks: The proportion of nodes belonging to a minority group, the proportion of the minority group among high degree nodes, the proportion of within-group edges, and Coleman’s homophily index. Simulated and empirical graphs show that this procedure performs well compared to optimal baselines in a variety of circumstances, while indicating that variance increases can be large for low-recall classifiers.

AB - We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: The network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a three step procedure which entails: (1) walking the graph starting from an arbitrary node; (2) learning a classifier on the nodes in the walk; and (3) applying a post-hoc adjustment to classification labels. The walk step provides the information necessary to make inferences over the nodes and edges, while the adjustment step corrects for classifier bias in estimating group proportions. This process provides de-biased estimates at the cost of additional variance. We evaluate AdjustedWalk on four tasks: The proportion of nodes belonging to a minority group, the proportion of the minority group among high degree nodes, the proportion of within-group edges, and Coleman’s homophily index. Simulated and empirical graphs show that this procedure performs well compared to optimal baselines in a variety of circumstances, while indicating that variance increases can be large for low-recall classifiers.

KW - Classification error

KW - Digital demography

KW - Network sampling

KW - Quantification learning

UR - http://www.scopus.com/inward/record.url?scp=85057321263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057321263&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01129-1_5

DO - 10.1007/978-3-030-01129-1_5

M3 - Conference contribution

SN - 9783030011284

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 67

EP - 85

BT - Social Informatics - 10th International Conference, SocInfo 2018, Proceedings

A2 - Koltsova, Olessia

A2 - Ignatov, Dmitry I.

A2 - Staab, Steffen

PB - Springer Verlag

ER -