A robust framework for classifying evolving document streams in an expert-machine-crowd setting

Muhammad Imran, Sanjay Chawla, Carlos Castillo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-To-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and outlier detection by formulating a novel optimization problem: COD-Means. We design an algorithm to solve the COD-Means problem and show that COD-Means will not only help detect novel categories but also seamlessly discover human annotation errors and improve the overall quality of the categorization process. Experiments on diverse real data sets demonstrate that our approach is both effective and efficient.

Original languageEnglish
Title of host publicationProceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages961-966
Number of pages6
ISBN (Electronic)9781509054725
DOIs
Publication statusPublished - 31 Jan 2017
Event16th IEEE International Conference on Data Mining, ICDM 2016 - Barcelona, Catalonia, Spain
Duration: 12 Dec 201615 Dec 2016

Other

Other16th IEEE International Conference on Data Mining, ICDM 2016
CountrySpain
CityBarcelona, Catalonia
Period12/12/1615/12/16

Fingerprint

Experiments

Keywords

  • Novel concept detection
  • Outlier detection
  • Social media
  • Stream classification
  • Text classification

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Imran, M., Chawla, S., & Castillo, C. (2017). A robust framework for classifying evolving document streams in an expert-machine-crowd setting. In Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016 (pp. 961-966). [7837933] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2016.143

A robust framework for classifying evolving document streams in an expert-machine-crowd setting. / Imran, Muhammad; Chawla, Sanjay; Castillo, Carlos.

Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016. Institute of Electrical and Electronics Engineers Inc., 2017. p. 961-966 7837933.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Imran, M, Chawla, S & Castillo, C 2017, A robust framework for classifying evolving document streams in an expert-machine-crowd setting. in Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016., 7837933, Institute of Electrical and Electronics Engineers Inc., pp. 961-966, 16th IEEE International Conference on Data Mining, ICDM 2016, Barcelona, Catalonia, Spain, 12/12/16. https://doi.org/10.1109/ICDM.2016.143
Imran M, Chawla S, Castillo C. A robust framework for classifying evolving document streams in an expert-machine-crowd setting. In Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016. Institute of Electrical and Electronics Engineers Inc. 2017. p. 961-966. 7837933 https://doi.org/10.1109/ICDM.2016.143
Imran, Muhammad ; Chawla, Sanjay ; Castillo, Carlos. / A robust framework for classifying evolving document streams in an expert-machine-crowd setting. Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 961-966
@inproceedings{be8fa76f1b574c73a344f763ccdad331,
title = "A robust framework for classifying evolving document streams in an expert-machine-crowd setting",
abstract = "An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-To-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and outlier detection by formulating a novel optimization problem: COD-Means. We design an algorithm to solve the COD-Means problem and show that COD-Means will not only help detect novel categories but also seamlessly discover human annotation errors and improve the overall quality of the categorization process. Experiments on diverse real data sets demonstrate that our approach is both effective and efficient.",
keywords = "Novel concept detection, Outlier detection, Social media, Stream classification, Text classification",
author = "Muhammad Imran and Sanjay Chawla and Carlos Castillo",
year = "2017",
month = "1",
day = "31",
doi = "10.1109/ICDM.2016.143",
language = "English",
pages = "961--966",
booktitle = "Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A robust framework for classifying evolving document streams in an expert-machine-crowd setting

AU - Imran, Muhammad

AU - Chawla, Sanjay

AU - Castillo, Carlos

PY - 2017/1/31

Y1 - 2017/1/31

N2 - An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-To-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and outlier detection by formulating a novel optimization problem: COD-Means. We design an algorithm to solve the COD-Means problem and show that COD-Means will not only help detect novel categories but also seamlessly discover human annotation errors and improve the overall quality of the categorization process. Experiments on diverse real data sets demonstrate that our approach is both effective and efficient.

AB - An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-To-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and outlier detection by formulating a novel optimization problem: COD-Means. We design an algorithm to solve the COD-Means problem and show that COD-Means will not only help detect novel categories but also seamlessly discover human annotation errors and improve the overall quality of the categorization process. Experiments on diverse real data sets demonstrate that our approach is both effective and efficient.

KW - Novel concept detection

KW - Outlier detection

KW - Social media

KW - Stream classification

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=85014527726&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014527726&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2016.143

DO - 10.1109/ICDM.2016.143

M3 - Conference contribution

AN - SCOPUS:85014527726

SP - 961

EP - 966

BT - Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -