Online optimization methods for the quantification problem

Purushottam Kar, Shuai Li, Harikrishna Narasimhan, Sanjay Chawla, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.

Original languageEnglish
Title of host publicationKDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1625-1634
Number of pages10
Volume13-17-August-2016
ISBN (Electronic)9781450342322
DOIs
Publication statusPublished - 13 Aug 2016
Event22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States
Duration: 13 Aug 201617 Aug 2016

Other

Other22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
CountryUnited States
CitySan Francisco
Period13/8/1617/8/16

Fingerprint

Epidemiology
Social sciences
Supervised learning
Experiments

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Kar, P., Li, S., Narasimhan, H., Chawla, S., & Sebastiani, F. (2016). Online optimization methods for the quantification problem. In KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. 13-17-August-2016, pp. 1625-1634). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939832

Online optimization methods for the quantification problem. / Kar, Purushottam; Li, Shuai; Narasimhan, Harikrishna; Chawla, Sanjay; Sebastiani, Fabrizio.

KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 13-17-August-2016 Association for Computing Machinery, 2016. p. 1625-1634.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kar, P, Li, S, Narasimhan, H, Chawla, S & Sebastiani, F 2016, Online optimization methods for the quantification problem. in KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. vol. 13-17-August-2016, Association for Computing Machinery, pp. 1625-1634, 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, San Francisco, United States, 13/8/16. https://doi.org/10.1145/2939672.2939832
Kar P, Li S, Narasimhan H, Chawla S, Sebastiani F. Online optimization methods for the quantification problem. In KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 13-17-August-2016. Association for Computing Machinery. 2016. p. 1625-1634 https://doi.org/10.1145/2939672.2939832
Kar, Purushottam ; Li, Shuai ; Narasimhan, Harikrishna ; Chawla, Sanjay ; Sebastiani, Fabrizio. / Online optimization methods for the quantification problem. KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 13-17-August-2016 Association for Computing Machinery, 2016. pp. 1625-1634
@inproceedings{d826b6c05a2d4a49a99f16efb993682b,
title = "Online optimization methods for the quantification problem",
abstract = "The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.",
author = "Purushottam Kar and Shuai Li and Harikrishna Narasimhan and Sanjay Chawla and Fabrizio Sebastiani",
year = "2016",
month = "8",
day = "13",
doi = "10.1145/2939672.2939832",
language = "English",
volume = "13-17-August-2016",
pages = "1625--1634",
booktitle = "KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Online optimization methods for the quantification problem

AU - Kar, Purushottam

AU - Li, Shuai

AU - Narasimhan, Harikrishna

AU - Chawla, Sanjay

AU - Sebastiani, Fabrizio

PY - 2016/8/13

Y1 - 2016/8/13

N2 - The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.

AB - The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.

UR - http://www.scopus.com/inward/record.url?scp=84984992237&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984992237&partnerID=8YFLogxK

U2 - 10.1145/2939672.2939832

DO - 10.1145/2939672.2939832

M3 - Conference contribution

AN - SCOPUS:84984992237

VL - 13-17-August-2016

SP - 1625

EP - 1634

BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -