Tweet sentiment: From classification to quantification

Wei Gao, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Citations (Scopus)

Abstract

Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. "prevalence") of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper we show, on a multiplicity of TSC datasets, that using a quantification-specific algorithm produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.

Original languageEnglish
Title of host publicationProceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015
PublisherAssociation for Computing Machinery, Inc
Pages97-104
Number of pages8
ISBN (Print)9781450338547
DOIs
Publication statusPublished - 25 Aug 2015
EventIEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015 - Paris, France
Duration: 25 Aug 201528 Aug 2015

Other

OtherIEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015
CountryFrance
CityParis
Period25/8/1528/8/15

Fingerprint

Learning algorithms
Social sciences
Labels
Switches

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Gao, W., & Sebastiani, F. (2015). Tweet sentiment: From classification to quantification. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015 (pp. 97-104). Association for Computing Machinery, Inc. https://doi.org/10.1145/2808797.2809327

Tweet sentiment : From classification to quantification. / Gao, Wei; Sebastiani, Fabrizio.

Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015. Association for Computing Machinery, Inc, 2015. p. 97-104.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gao, W & Sebastiani, F 2015, Tweet sentiment: From classification to quantification. in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015. Association for Computing Machinery, Inc, pp. 97-104, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015, Paris, France, 25/8/15. https://doi.org/10.1145/2808797.2809327
Gao W, Sebastiani F. Tweet sentiment: From classification to quantification. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015. Association for Computing Machinery, Inc. 2015. p. 97-104 https://doi.org/10.1145/2808797.2809327
Gao, Wei ; Sebastiani, Fabrizio. / Tweet sentiment : From classification to quantification. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015. Association for Computing Machinery, Inc, 2015. pp. 97-104
@inproceedings{cc36c80b55164186970547997ae94e92,
title = "Tweet sentiment: From classification to quantification",
abstract = "Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. {"}prevalence{"}) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper we show, on a multiplicity of TSC datasets, that using a quantification-specific algorithm produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.",
author = "Wei Gao and Fabrizio Sebastiani",
year = "2015",
month = "8",
day = "25",
doi = "10.1145/2808797.2809327",
language = "English",
isbn = "9781450338547",
pages = "97--104",
booktitle = "Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Tweet sentiment

T2 - From classification to quantification

AU - Gao, Wei

AU - Sebastiani, Fabrizio

PY - 2015/8/25

Y1 - 2015/8/25

N2 - Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. "prevalence") of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper we show, on a multiplicity of TSC datasets, that using a quantification-specific algorithm produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.

AB - Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. "prevalence") of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper we show, on a multiplicity of TSC datasets, that using a quantification-specific algorithm produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.

UR - http://www.scopus.com/inward/record.url?scp=84962580356&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962580356&partnerID=8YFLogxK

U2 - 10.1145/2808797.2809327

DO - 10.1145/2808797.2809327

M3 - Conference contribution

AN - SCOPUS:84962580356

SN - 9781450338547

SP - 97

EP - 104

BT - Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015

PB - Association for Computing Machinery, Inc

ER -