ClassiMap: A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data

Sylvain Lespinats, Michael Aupetit, Anke Meyer-Baese

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.

Original languageEnglish
Article number1551008
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume29
Issue number6
DOIs
Publication statusPublished - 14 Sep 2015

    Fingerprint

Keywords

  • dimensionality reduction
  • distance preservation
  • exploratory data analysis
  • labeled data
  • mapping evaluation
  • Multidimensional scaling

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Computer Vision and Pattern Recognition

Cite this