On the impact of seed words on sentiment polarity lexicon induction

Dame Jovanoski, Veno Pachovski, Preslav Nakov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Sentiment polarity lexicons are key resources for sentiment analysis, and researchers have invested a lot of efforts in their manual creation. However, there has been a recent shift towards automatically extracted lexicons, which are orders of magnitude larger and perform much better. These lexicons are typically mined using bootstrapping, starting from very few seed words whose polarity is given, e.g., 50-60 words, and sometimes even just 5-6. Here we demonstrate that much higher-quality lexicons can be built by starting with hundreds of words and phrases as seeds, especially when they are in-domain. Thus, we combine (f) mid-sized high-quality manually crafted lexicons as seeds and (if) bootstrapping, in order to build large-scale lexicons.

Original languageEnglish
Title of host publicationCOLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
Subtitle of host publicationTechnical Papers
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages1557-1567
Number of pages11
ISBN (Print)9784879747020
Publication statusPublished - 1 Jan 2016
Event26th International Conference on Computational Linguistics, COLING 2016 - Osaka, Japan
Duration: 11 Dec 201616 Dec 2016

Other

Other26th International Conference on Computational Linguistics, COLING 2016
CountryJapan
CityOsaka
Period11/12/1616/12/16

Fingerprint

induction
Seed
resources
Induction
Sentiment
Lexicon
Polarity

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Cite this

Jovanoski, D., Pachovski, V., & Nakov, P. (2016). On the impact of seed words on sentiment polarity lexicon induction. In COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers (pp. 1557-1567). Association for Computational Linguistics, ACL Anthology.

On the impact of seed words on sentiment polarity lexicon induction. / Jovanoski, Dame; Pachovski, Veno; Nakov, Preslav.

COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers. Association for Computational Linguistics, ACL Anthology, 2016. p. 1557-1567.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jovanoski, D, Pachovski, V & Nakov, P 2016, On the impact of seed words on sentiment polarity lexicon induction. in COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers. Association for Computational Linguistics, ACL Anthology, pp. 1557-1567, 26th International Conference on Computational Linguistics, COLING 2016, Osaka, Japan, 11/12/16.
Jovanoski D, Pachovski V, Nakov P. On the impact of seed words on sentiment polarity lexicon induction. In COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers. Association for Computational Linguistics, ACL Anthology. 2016. p. 1557-1567
Jovanoski, Dame ; Pachovski, Veno ; Nakov, Preslav. / On the impact of seed words on sentiment polarity lexicon induction. COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers. Association for Computational Linguistics, ACL Anthology, 2016. pp. 1557-1567
@inproceedings{9fa5a9eb4091410787bf34e5b061d76b,
title = "On the impact of seed words on sentiment polarity lexicon induction",
abstract = "Sentiment polarity lexicons are key resources for sentiment analysis, and researchers have invested a lot of efforts in their manual creation. However, there has been a recent shift towards automatically extracted lexicons, which are orders of magnitude larger and perform much better. These lexicons are typically mined using bootstrapping, starting from very few seed words whose polarity is given, e.g., 50-60 words, and sometimes even just 5-6. Here we demonstrate that much higher-quality lexicons can be built by starting with hundreds of words and phrases as seeds, especially when they are in-domain. Thus, we combine (f) mid-sized high-quality manually crafted lexicons as seeds and (if) bootstrapping, in order to build large-scale lexicons.",
author = "Dame Jovanoski and Veno Pachovski and Preslav Nakov",
year = "2016",
month = "1",
day = "1",
language = "English",
isbn = "9784879747020",
pages = "1557--1567",
booktitle = "COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016",
publisher = "Association for Computational Linguistics, ACL Anthology",

}

TY - GEN

T1 - On the impact of seed words on sentiment polarity lexicon induction

AU - Jovanoski, Dame

AU - Pachovski, Veno

AU - Nakov, Preslav

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Sentiment polarity lexicons are key resources for sentiment analysis, and researchers have invested a lot of efforts in their manual creation. However, there has been a recent shift towards automatically extracted lexicons, which are orders of magnitude larger and perform much better. These lexicons are typically mined using bootstrapping, starting from very few seed words whose polarity is given, e.g., 50-60 words, and sometimes even just 5-6. Here we demonstrate that much higher-quality lexicons can be built by starting with hundreds of words and phrases as seeds, especially when they are in-domain. Thus, we combine (f) mid-sized high-quality manually crafted lexicons as seeds and (if) bootstrapping, in order to build large-scale lexicons.

AB - Sentiment polarity lexicons are key resources for sentiment analysis, and researchers have invested a lot of efforts in their manual creation. However, there has been a recent shift towards automatically extracted lexicons, which are orders of magnitude larger and perform much better. These lexicons are typically mined using bootstrapping, starting from very few seed words whose polarity is given, e.g., 50-60 words, and sometimes even just 5-6. Here we demonstrate that much higher-quality lexicons can be built by starting with hundreds of words and phrases as seeds, especially when they are in-domain. Thus, we combine (f) mid-sized high-quality manually crafted lexicons as seeds and (if) bootstrapping, in order to build large-scale lexicons.

UR - http://www.scopus.com/inward/record.url?scp=85029379977&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029379977&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9784879747020

SP - 1557

EP - 1567

BT - COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016

PB - Association for Computational Linguistics, ACL Anthology

ER -