From classification to quantification in tweet sentiment analysis

Wei Gao, Fabrizio Sebastiani

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.

Original languageEnglish
Article number19
JournalSocial Network Analysis and Mining
Volume6
Issue number1
DOIs
Publication statusPublished - 1 Dec 2016

Fingerprint

quantification
Learning algorithms
market research
Social sciences
evaluation
political science
learning
Labels
social science
Switches
human being
experiment
Experiments

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction
  • Information Systems
  • Communication
  • Media Technology

Cite this

From classification to quantification in tweet sentiment analysis. / Gao, Wei; Sebastiani, Fabrizio.

In: Social Network Analysis and Mining, Vol. 6, No. 1, 19, 01.12.2016.

Research output: Contribution to journalArticle

@article{1b1d49362f474e5e93c2995a38fa3eea,
title = "From classification to quantification in tweet sentiment analysis",
abstract = "Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.",
author = "Wei Gao and Fabrizio Sebastiani",
year = "2016",
month = "12",
day = "1",
doi = "10.1007/s13278-016-0327-z",
language = "English",
volume = "6",
journal = "Social Network Analysis and Mining",
issn = "1869-5450",
publisher = "Springer Wien",
number = "1",

}

TY - JOUR

T1 - From classification to quantification in tweet sentiment analysis

AU - Gao, Wei

AU - Sebastiani, Fabrizio

PY - 2016/12/1

Y1 - 2016/12/1

N2 - Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.

AB - Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.

UR - http://www.scopus.com/inward/record.url?scp=84963722105&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963722105&partnerID=8YFLogxK

U2 - 10.1007/s13278-016-0327-z

DO - 10.1007/s13278-016-0327-z

M3 - Article

VL - 6

JO - Social Network Analysis and Mining

JF - Social Network Analysis and Mining

SN - 1869-5450

IS - 1

M1 - 19

ER -