Robust language learning via efficient budgeted online algorithms

Simone Filice, Giuseppe Castellucci, Danilo Croce, Roberto Basili

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In many Natural Language Processing tasks, kernel learning allows to define robust and effective systems. At the same time, Online Learning Algorithms are appealing for their incremental and continuous learning capability. They allow to follow a target problem, with a constant adaptation to a dynamic environment. The drawback of using kernels in online settings is the continuous complexity growth, in terms of time and memory usage, experienced both in the learning and classification phases. In this paper, we extend a state-of-the-art Budgeted Online Learning Algorithm that efficiently constraints the overall complexity. We introduce the principles of Fairness and Weight Adjustment: the former mitigates the effect of unbalanced datasets, while the latter improves the stability of the resulting models. The usage of robust semantic kernel functions in Sentiment Analysis in Twitter improves the results with respect to the standard budgeted formulation. Performances are comparable with one of the most efficient Support Vector Machine implementations, still preserving all the advantages of online methods. Results are straightforward considering that the task has been tackled without manually coded resources (e.g. Word Net or a Polarity Lexicon) but mainly exploiting distributional analysis of unlabeled corpora.

Original languageEnglish
Title of host publicationProceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013
PublisherIEEE Computer Society
Pages913-920
Number of pages8
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 13th IEEE International Conference on Data Mining Workshops, ICDMW 2013 - Dallas, TX
Duration: 7 Dec 201310 Dec 2013

Other

Other2013 13th IEEE International Conference on Data Mining Workshops, ICDMW 2013
CityDallas, TX
Period7/12/1310/12/13

Fingerprint

Learning algorithms
Support vector machines
Semantics
Data storage equipment
Processing

Keywords

  • Online learning
  • Sentiment analysis

ASJC Scopus subject areas

  • Software

Cite this

Filice, S., Castellucci, G., Croce, D., & Basili, R. (2013). Robust language learning via efficient budgeted online algorithms. In Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013 (pp. 913-920). [6754019] IEEE Computer Society. https://doi.org/10.1109/ICDMW.2013.87

Robust language learning via efficient budgeted online algorithms. / Filice, Simone; Castellucci, Giuseppe; Croce, Danilo; Basili, Roberto.

Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013. IEEE Computer Society, 2013. p. 913-920 6754019.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Filice, S, Castellucci, G, Croce, D & Basili, R 2013, Robust language learning via efficient budgeted online algorithms. in Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013., 6754019, IEEE Computer Society, pp. 913-920, 2013 13th IEEE International Conference on Data Mining Workshops, ICDMW 2013, Dallas, TX, 7/12/13. https://doi.org/10.1109/ICDMW.2013.87
Filice S, Castellucci G, Croce D, Basili R. Robust language learning via efficient budgeted online algorithms. In Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013. IEEE Computer Society. 2013. p. 913-920. 6754019 https://doi.org/10.1109/ICDMW.2013.87
Filice, Simone ; Castellucci, Giuseppe ; Croce, Danilo ; Basili, Roberto. / Robust language learning via efficient budgeted online algorithms. Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013. IEEE Computer Society, 2013. pp. 913-920
@inproceedings{ef1595def40842b091e1d96011706aca,
title = "Robust language learning via efficient budgeted online algorithms",
abstract = "In many Natural Language Processing tasks, kernel learning allows to define robust and effective systems. At the same time, Online Learning Algorithms are appealing for their incremental and continuous learning capability. They allow to follow a target problem, with a constant adaptation to a dynamic environment. The drawback of using kernels in online settings is the continuous complexity growth, in terms of time and memory usage, experienced both in the learning and classification phases. In this paper, we extend a state-of-the-art Budgeted Online Learning Algorithm that efficiently constraints the overall complexity. We introduce the principles of Fairness and Weight Adjustment: the former mitigates the effect of unbalanced datasets, while the latter improves the stability of the resulting models. The usage of robust semantic kernel functions in Sentiment Analysis in Twitter improves the results with respect to the standard budgeted formulation. Performances are comparable with one of the most efficient Support Vector Machine implementations, still preserving all the advantages of online methods. Results are straightforward considering that the task has been tackled without manually coded resources (e.g. Word Net or a Polarity Lexicon) but mainly exploiting distributional analysis of unlabeled corpora.",
keywords = "Online learning, Sentiment analysis",
author = "Simone Filice and Giuseppe Castellucci and Danilo Croce and Roberto Basili",
year = "2013",
doi = "10.1109/ICDMW.2013.87",
language = "English",
pages = "913--920",
booktitle = "Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Robust language learning via efficient budgeted online algorithms

AU - Filice, Simone

AU - Castellucci, Giuseppe

AU - Croce, Danilo

AU - Basili, Roberto

PY - 2013

Y1 - 2013

N2 - In many Natural Language Processing tasks, kernel learning allows to define robust and effective systems. At the same time, Online Learning Algorithms are appealing for their incremental and continuous learning capability. They allow to follow a target problem, with a constant adaptation to a dynamic environment. The drawback of using kernels in online settings is the continuous complexity growth, in terms of time and memory usage, experienced both in the learning and classification phases. In this paper, we extend a state-of-the-art Budgeted Online Learning Algorithm that efficiently constraints the overall complexity. We introduce the principles of Fairness and Weight Adjustment: the former mitigates the effect of unbalanced datasets, while the latter improves the stability of the resulting models. The usage of robust semantic kernel functions in Sentiment Analysis in Twitter improves the results with respect to the standard budgeted formulation. Performances are comparable with one of the most efficient Support Vector Machine implementations, still preserving all the advantages of online methods. Results are straightforward considering that the task has been tackled without manually coded resources (e.g. Word Net or a Polarity Lexicon) but mainly exploiting distributional analysis of unlabeled corpora.

AB - In many Natural Language Processing tasks, kernel learning allows to define robust and effective systems. At the same time, Online Learning Algorithms are appealing for their incremental and continuous learning capability. They allow to follow a target problem, with a constant adaptation to a dynamic environment. The drawback of using kernels in online settings is the continuous complexity growth, in terms of time and memory usage, experienced both in the learning and classification phases. In this paper, we extend a state-of-the-art Budgeted Online Learning Algorithm that efficiently constraints the overall complexity. We introduce the principles of Fairness and Weight Adjustment: the former mitigates the effect of unbalanced datasets, while the latter improves the stability of the resulting models. The usage of robust semantic kernel functions in Sentiment Analysis in Twitter improves the results with respect to the standard budgeted formulation. Performances are comparable with one of the most efficient Support Vector Machine implementations, still preserving all the advantages of online methods. Results are straightforward considering that the task has been tackled without manually coded resources (e.g. Word Net or a Polarity Lexicon) but mainly exploiting distributional analysis of unlabeled corpora.

KW - Online learning

KW - Sentiment analysis

UR - http://www.scopus.com/inward/record.url?scp=84898041738&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898041738&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2013.87

DO - 10.1109/ICDMW.2013.87

M3 - Conference contribution

AN - SCOPUS:84898041738

SP - 913

EP - 920

BT - Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013

PB - IEEE Computer Society

ER -