Predicting information credibility in time-sensitive social media

Carlos Castillo, Marcelo Mendoza, Barbara Poblete

Research output: Contribution to journalArticle

143 Citations (Scopus)

Abstract

Purpose: Twitter is a popular microblogging service which has proven, in recent years, its potential for propagating news and information about developing events. The purpose of this paper is to focus on the analysis of information credibility on Twitter. The purpose of our research is to establish if an automatic discovery process of relevant and credible news events can be achieved. Design/methodology/approach: The paper follows a supervised learning approach for the task of automatic classification of credible news events. A first classifier decides if an information cascade corresponds to a newsworthy event. Then a second classifier decides if this cascade can be considered credible or not. The paper undertakes this effort training over a significant amount of labeled data, obtained using crowdsourcing tools. The paper validates these classifiers under two settings: the first, a sample of automatically detected Twitter "trends" in English, and second, the paper tests how well this model transfers to Twitter topics in Spanish, automatically detected during a natural disaster. Findings: There are measurable differences in the way microblog messages propagate. The paper shows that these differences are related to the newsworthiness and credibility of the information conveyed, and describes features that are effective for classifying information automatically as credible or not credible. Originality/value: The paper first tests the approach under normal conditions, and then the paper extends the findings to a disaster management situation, where many news and rumors arise. Additionally, by analyzing the transfer of our classifiers across languages, the paper is able to look more deeply into which topic-features are more relevant for credibility assessment. To the best of our knowledge, this is the first paper that studies the power of prediction of social media for information credibility, considering model transfer into time-sensitive and language-sensitive contexts.

Original languageEnglish
Pages (from-to)560-588
Number of pages29
JournalInternet Research
Volume23
Issue number5
DOIs
Publication statusPublished - 18 Oct 2013

Fingerprint

social media
credibility
Classifiers
twitter
news
Disasters
Context sensitive languages
event
Supervised learning
rumor
language
time
Social media
Credibility
disaster
natural disaster
News
Classifier
Twitter
methodology

Keywords

  • Information credibility
  • Model transfer
  • Online social networks
  • Social media prediction
  • Time sensitiveness

ASJC Scopus subject areas

  • Communication
  • Sociology and Political Science
  • Economics and Econometrics

Cite this

Predicting information credibility in time-sensitive social media. / Castillo, Carlos; Mendoza, Marcelo; Poblete, Barbara.

In: Internet Research, Vol. 23, No. 5, 18.10.2013, p. 560-588.

Research output: Contribution to journalArticle

Castillo, Carlos ; Mendoza, Marcelo ; Poblete, Barbara. / Predicting information credibility in time-sensitive social media. In: Internet Research. 2013 ; Vol. 23, No. 5. pp. 560-588.
@article{9dde8e7c5e154af9b72f22a15543aa72,
title = "Predicting information credibility in time-sensitive social media",
abstract = "Purpose: Twitter is a popular microblogging service which has proven, in recent years, its potential for propagating news and information about developing events. The purpose of this paper is to focus on the analysis of information credibility on Twitter. The purpose of our research is to establish if an automatic discovery process of relevant and credible news events can be achieved. Design/methodology/approach: The paper follows a supervised learning approach for the task of automatic classification of credible news events. A first classifier decides if an information cascade corresponds to a newsworthy event. Then a second classifier decides if this cascade can be considered credible or not. The paper undertakes this effort training over a significant amount of labeled data, obtained using crowdsourcing tools. The paper validates these classifiers under two settings: the first, a sample of automatically detected Twitter {"}trends{"} in English, and second, the paper tests how well this model transfers to Twitter topics in Spanish, automatically detected during a natural disaster. Findings: There are measurable differences in the way microblog messages propagate. The paper shows that these differences are related to the newsworthiness and credibility of the information conveyed, and describes features that are effective for classifying information automatically as credible or not credible. Originality/value: The paper first tests the approach under normal conditions, and then the paper extends the findings to a disaster management situation, where many news and rumors arise. Additionally, by analyzing the transfer of our classifiers across languages, the paper is able to look more deeply into which topic-features are more relevant for credibility assessment. To the best of our knowledge, this is the first paper that studies the power of prediction of social media for information credibility, considering model transfer into time-sensitive and language-sensitive contexts.",
keywords = "Information credibility, Model transfer, Online social networks, Social media prediction, Time sensitiveness",
author = "Carlos Castillo and Marcelo Mendoza and Barbara Poblete",
year = "2013",
month = "10",
day = "18",
doi = "10.1108/IntR-05-2012-0095",
language = "English",
volume = "23",
pages = "560--588",
journal = "Internet Research",
issn = "1066-2243",
publisher = "Emerald Group Publishing Ltd.",
number = "5",

}

TY - JOUR

T1 - Predicting information credibility in time-sensitive social media

AU - Castillo, Carlos

AU - Mendoza, Marcelo

AU - Poblete, Barbara

PY - 2013/10/18

Y1 - 2013/10/18

N2 - Purpose: Twitter is a popular microblogging service which has proven, in recent years, its potential for propagating news and information about developing events. The purpose of this paper is to focus on the analysis of information credibility on Twitter. The purpose of our research is to establish if an automatic discovery process of relevant and credible news events can be achieved. Design/methodology/approach: The paper follows a supervised learning approach for the task of automatic classification of credible news events. A first classifier decides if an information cascade corresponds to a newsworthy event. Then a second classifier decides if this cascade can be considered credible or not. The paper undertakes this effort training over a significant amount of labeled data, obtained using crowdsourcing tools. The paper validates these classifiers under two settings: the first, a sample of automatically detected Twitter "trends" in English, and second, the paper tests how well this model transfers to Twitter topics in Spanish, automatically detected during a natural disaster. Findings: There are measurable differences in the way microblog messages propagate. The paper shows that these differences are related to the newsworthiness and credibility of the information conveyed, and describes features that are effective for classifying information automatically as credible or not credible. Originality/value: The paper first tests the approach under normal conditions, and then the paper extends the findings to a disaster management situation, where many news and rumors arise. Additionally, by analyzing the transfer of our classifiers across languages, the paper is able to look more deeply into which topic-features are more relevant for credibility assessment. To the best of our knowledge, this is the first paper that studies the power of prediction of social media for information credibility, considering model transfer into time-sensitive and language-sensitive contexts.

AB - Purpose: Twitter is a popular microblogging service which has proven, in recent years, its potential for propagating news and information about developing events. The purpose of this paper is to focus on the analysis of information credibility on Twitter. The purpose of our research is to establish if an automatic discovery process of relevant and credible news events can be achieved. Design/methodology/approach: The paper follows a supervised learning approach for the task of automatic classification of credible news events. A first classifier decides if an information cascade corresponds to a newsworthy event. Then a second classifier decides if this cascade can be considered credible or not. The paper undertakes this effort training over a significant amount of labeled data, obtained using crowdsourcing tools. The paper validates these classifiers under two settings: the first, a sample of automatically detected Twitter "trends" in English, and second, the paper tests how well this model transfers to Twitter topics in Spanish, automatically detected during a natural disaster. Findings: There are measurable differences in the way microblog messages propagate. The paper shows that these differences are related to the newsworthiness and credibility of the information conveyed, and describes features that are effective for classifying information automatically as credible or not credible. Originality/value: The paper first tests the approach under normal conditions, and then the paper extends the findings to a disaster management situation, where many news and rumors arise. Additionally, by analyzing the transfer of our classifiers across languages, the paper is able to look more deeply into which topic-features are more relevant for credibility assessment. To the best of our knowledge, this is the first paper that studies the power of prediction of social media for information credibility, considering model transfer into time-sensitive and language-sensitive contexts.

KW - Information credibility

KW - Model transfer

KW - Online social networks

KW - Social media prediction

KW - Time sensitiveness

UR - http://www.scopus.com/inward/record.url?scp=84885445783&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885445783&partnerID=8YFLogxK

U2 - 10.1108/IntR-05-2012-0095

DO - 10.1108/IntR-05-2012-0095

M3 - Article

AN - SCOPUS:84885445783

VL - 23

SP - 560

EP - 588

JO - Internet Research

JF - Internet Research

SN - 1066-2243

IS - 5

ER -