Bridging social media via distant supervision

Walid Magdy, Hassan Sajjad, Tarek El-Ganainy, Fabrizio Sebastiani

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Microblog classification has received a lot of attention in recent years. Different classification tasks have been investigated, most of them focusing on classifying microblogs into a small number of classes (five or less) using a training set of manually annotated tweets. Unfortunately, labelling data is tedious and expensive, and finding tweets that cover all the classes of interest is not always straightforward, especially when some of the classes do not frequently arise in practice. In this paper, we study an approach to tweet classification based on distant supervision, whereby we automatically transfer labels from one social medium to another for a single-label multi-class classification task. In particular, we apply YouTube video classes to tweets linking to these videos. This provides for free a virtually unlimited number of labelled instances that can be used as training data. The classification experiments we have run show that training a tweet classifier via these automatically labelled data achieves substantially better performance than training the same classifier with a limited amount of manually labelled data; this is advantageous, given that the automatically labelled data come at no cost. Further investigation of our approach shows its robustness when applied with different numbers of classes and across different languages.

Original languageEnglish
Article number35
Pages (from-to)1-12
Number of pages12
JournalSocial Network Analysis and Mining
Volume5
Issue number1
DOIs
Publication statusPublished - 1 Jan 2015

Fingerprint

social media
supervision
Labels
Classifiers
video
Labeling
experiment
costs
language
performance
Costs
Experiments

Keywords

  • Distant supervision
  • Tweet classification
  • Twitter
  • YouTube

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction
  • Information Systems
  • Communication
  • Media Technology

Cite this

Bridging social media via distant supervision. / Magdy, Walid; Sajjad, Hassan; El-Ganainy, Tarek; Sebastiani, Fabrizio.

In: Social Network Analysis and Mining, Vol. 5, No. 1, 35, 01.01.2015, p. 1-12.

Research output: Contribution to journalArticle

Magdy, Walid ; Sajjad, Hassan ; El-Ganainy, Tarek ; Sebastiani, Fabrizio. / Bridging social media via distant supervision. In: Social Network Analysis and Mining. 2015 ; Vol. 5, No. 1. pp. 1-12.
@article{924f2dd9d03146bf9898c298a75bd56f,
title = "Bridging social media via distant supervision",
abstract = "Microblog classification has received a lot of attention in recent years. Different classification tasks have been investigated, most of them focusing on classifying microblogs into a small number of classes (five or less) using a training set of manually annotated tweets. Unfortunately, labelling data is tedious and expensive, and finding tweets that cover all the classes of interest is not always straightforward, especially when some of the classes do not frequently arise in practice. In this paper, we study an approach to tweet classification based on distant supervision, whereby we automatically transfer labels from one social medium to another for a single-label multi-class classification task. In particular, we apply YouTube video classes to tweets linking to these videos. This provides for free a virtually unlimited number of labelled instances that can be used as training data. The classification experiments we have run show that training a tweet classifier via these automatically labelled data achieves substantially better performance than training the same classifier with a limited amount of manually labelled data; this is advantageous, given that the automatically labelled data come at no cost. Further investigation of our approach shows its robustness when applied with different numbers of classes and across different languages.",
keywords = "Distant supervision, Tweet classification, Twitter, YouTube",
author = "Walid Magdy and Hassan Sajjad and Tarek El-Ganainy and Fabrizio Sebastiani",
year = "2015",
month = "1",
day = "1",
doi = "10.1007/s13278-015-0275-z",
language = "English",
volume = "5",
pages = "1--12",
journal = "Social Network Analysis and Mining",
issn = "1869-5450",
publisher = "Springer Wien",
number = "1",

}

TY - JOUR

T1 - Bridging social media via distant supervision

AU - Magdy, Walid

AU - Sajjad, Hassan

AU - El-Ganainy, Tarek

AU - Sebastiani, Fabrizio

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Microblog classification has received a lot of attention in recent years. Different classification tasks have been investigated, most of them focusing on classifying microblogs into a small number of classes (five or less) using a training set of manually annotated tweets. Unfortunately, labelling data is tedious and expensive, and finding tweets that cover all the classes of interest is not always straightforward, especially when some of the classes do not frequently arise in practice. In this paper, we study an approach to tweet classification based on distant supervision, whereby we automatically transfer labels from one social medium to another for a single-label multi-class classification task. In particular, we apply YouTube video classes to tweets linking to these videos. This provides for free a virtually unlimited number of labelled instances that can be used as training data. The classification experiments we have run show that training a tweet classifier via these automatically labelled data achieves substantially better performance than training the same classifier with a limited amount of manually labelled data; this is advantageous, given that the automatically labelled data come at no cost. Further investigation of our approach shows its robustness when applied with different numbers of classes and across different languages.

AB - Microblog classification has received a lot of attention in recent years. Different classification tasks have been investigated, most of them focusing on classifying microblogs into a small number of classes (five or less) using a training set of manually annotated tweets. Unfortunately, labelling data is tedious and expensive, and finding tweets that cover all the classes of interest is not always straightforward, especially when some of the classes do not frequently arise in practice. In this paper, we study an approach to tweet classification based on distant supervision, whereby we automatically transfer labels from one social medium to another for a single-label multi-class classification task. In particular, we apply YouTube video classes to tweets linking to these videos. This provides for free a virtually unlimited number of labelled instances that can be used as training data. The classification experiments we have run show that training a tweet classifier via these automatically labelled data achieves substantially better performance than training the same classifier with a limited amount of manually labelled data; this is advantageous, given that the automatically labelled data come at no cost. Further investigation of our approach shows its robustness when applied with different numbers of classes and across different languages.

KW - Distant supervision

KW - Tweet classification

KW - Twitter

KW - YouTube

UR - http://www.scopus.com/inward/record.url?scp=84947278653&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947278653&partnerID=8YFLogxK

U2 - 10.1007/s13278-015-0275-z

DO - 10.1007/s13278-015-0275-z

M3 - Article

VL - 5

SP - 1

EP - 12

JO - Social Network Analysis and Mining

JF - Social Network Analysis and Mining

SN - 1869-5450

IS - 1

M1 - 35

ER -