Contextual anomaly detection in text data

Amogh Mahapatra, Nisheeth Srivastava, Jaideep Srivastava

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

We propose using side information to further inform anomaly detection algorithms of the semantic context of the text data they are analyzing, thereby considering both divergence from the statistical pattern seen in particular datasets and divergence seen from more general semantic expectations. Computational experiments show that our algorithm performs as expected on data that reflect real-world events with contextual ambiguity, while replicating conventional clustering on data that are either too specialized or generic to result in contextual information being actionable. These results suggest that our algorithm could potentially reduce false positive rates in existing anomaly detection systems.

Original languageEnglish
Pages (from-to)469-489
Number of pages21
JournalAlgorithms
Volume5
Issue number4
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Anomaly Detection
Divergence
Semantics
Side Information
False Positive
Computational Experiments
Clustering
Text
Experiments

Keywords

  • Anomaly detection
  • Context detection
  • Topic modeling

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computational Mathematics
  • Numerical Analysis
  • Theoretical Computer Science

Cite this

Contextual anomaly detection in text data. / Mahapatra, Amogh; Srivastava, Nisheeth; Srivastava, Jaideep.

In: Algorithms, Vol. 5, No. 4, 2012, p. 469-489.

Research output: Contribution to journalArticle

Mahapatra, A, Srivastava, N & Srivastava, J 2012, 'Contextual anomaly detection in text data', Algorithms, vol. 5, no. 4, pp. 469-489. https://doi.org/10.3390/a5040469
Mahapatra, Amogh ; Srivastava, Nisheeth ; Srivastava, Jaideep. / Contextual anomaly detection in text data. In: Algorithms. 2012 ; Vol. 5, No. 4. pp. 469-489.
@article{da9c701f9eb6463399451e26820f5d5f,
title = "Contextual anomaly detection in text data",
abstract = "We propose using side information to further inform anomaly detection algorithms of the semantic context of the text data they are analyzing, thereby considering both divergence from the statistical pattern seen in particular datasets and divergence seen from more general semantic expectations. Computational experiments show that our algorithm performs as expected on data that reflect real-world events with contextual ambiguity, while replicating conventional clustering on data that are either too specialized or generic to result in contextual information being actionable. These results suggest that our algorithm could potentially reduce false positive rates in existing anomaly detection systems.",
keywords = "Anomaly detection, Context detection, Topic modeling",
author = "Amogh Mahapatra and Nisheeth Srivastava and Jaideep Srivastava",
year = "2012",
doi = "10.3390/a5040469",
language = "English",
volume = "5",
pages = "469--489",
journal = "Algorithms",
issn = "1999-4893",
publisher = "MDPI AG",
number = "4",

}

TY - JOUR

T1 - Contextual anomaly detection in text data

AU - Mahapatra, Amogh

AU - Srivastava, Nisheeth

AU - Srivastava, Jaideep

PY - 2012

Y1 - 2012

N2 - We propose using side information to further inform anomaly detection algorithms of the semantic context of the text data they are analyzing, thereby considering both divergence from the statistical pattern seen in particular datasets and divergence seen from more general semantic expectations. Computational experiments show that our algorithm performs as expected on data that reflect real-world events with contextual ambiguity, while replicating conventional clustering on data that are either too specialized or generic to result in contextual information being actionable. These results suggest that our algorithm could potentially reduce false positive rates in existing anomaly detection systems.

AB - We propose using side information to further inform anomaly detection algorithms of the semantic context of the text data they are analyzing, thereby considering both divergence from the statistical pattern seen in particular datasets and divergence seen from more general semantic expectations. Computational experiments show that our algorithm performs as expected on data that reflect real-world events with contextual ambiguity, while replicating conventional clustering on data that are either too specialized or generic to result in contextual information being actionable. These results suggest that our algorithm could potentially reduce false positive rates in existing anomaly detection systems.

KW - Anomaly detection

KW - Context detection

KW - Topic modeling

UR - http://www.scopus.com/inward/record.url?scp=84872705646&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872705646&partnerID=8YFLogxK

U2 - 10.3390/a5040469

DO - 10.3390/a5040469

M3 - Article

VL - 5

SP - 469

EP - 489

JO - Algorithms

JF - Algorithms

SN - 1999-4893

IS - 4

ER -