Multi-lingual opinion mining on YouTube

Aliaksei Severyn, Alessandro Moschitti, Olga Uryupina, Barbara Plank, Katja Filippova

Research output: Contribution to journalArticle

28 Citations (Scopus)

Abstract

In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.

Original languageEnglish
JournalInformation Processing and Management
DOIs
Publication statusAccepted/In press - 15 May 2014

Fingerprint

language
Syntactics
Italian language
Linguistics
Classifiers
English language
video
scenario
Processing
linguistics
Opinion mining
evaluation
resources
Language
Kernel
User-generated content
Resources
Scenarios
Modeling
Classifier

Keywords

  • Natural Language Processing
  • Opinion mining
  • Social media

ASJC Scopus subject areas

  • Media Technology
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences
  • Management Science and Operations Research

Cite this

Multi-lingual opinion mining on YouTube. / Severyn, Aliaksei; Moschitti, Alessandro; Uryupina, Olga; Plank, Barbara; Filippova, Katja.

In: Information Processing and Management, 15.05.2014.

Research output: Contribution to journalArticle

Severyn, Aliaksei ; Moschitti, Alessandro ; Uryupina, Olga ; Plank, Barbara ; Filippova, Katja. / Multi-lingual opinion mining on YouTube. In: Information Processing and Management. 2014.
@article{8696fc0dafd740a68de3a68c56fed6ee,
title = "Multi-lingual opinion mining on YouTube",
abstract = "In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6{\%} and 3{\%} of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4{\%} absolute improvement for both languages), especially when little training data is available (up to 10{\%} absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.",
keywords = "Natural Language Processing, Opinion mining, Social media",
author = "Aliaksei Severyn and Alessandro Moschitti and Olga Uryupina and Barbara Plank and Katja Filippova",
year = "2014",
month = "5",
day = "15",
doi = "10.1016/j.ipm.2015.03.002",
language = "English",
journal = "Information Processing and Management",
issn = "0306-4573",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Multi-lingual opinion mining on YouTube

AU - Severyn, Aliaksei

AU - Moschitti, Alessandro

AU - Uryupina, Olga

AU - Plank, Barbara

AU - Filippova, Katja

PY - 2014/5/15

Y1 - 2014/5/15

N2 - In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.

AB - In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.

KW - Natural Language Processing

KW - Opinion mining

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=84926435460&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926435460&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2015.03.002

DO - 10.1016/j.ipm.2015.03.002

M3 - Article

AN - SCOPUS:84926435460

JO - Information Processing and Management

JF - Information Processing and Management

SN - 0306-4573

ER -