ClassiMap

A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data

Sylvain Lespinats, Michael Aupetit, Anke Meyer-Baese

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.

Original languageEnglish
Article number1551008
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Volume29
Issue number6
DOIs
Publication statusPublished - 14 Sep 2015

Fingerprint

Labels

Keywords

  • dimensionality reduction
  • distance preservation
  • exploratory data analysis
  • labeled data
  • mapping evaluation
  • Multidimensional scaling

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Computer Vision and Pattern Recognition

Cite this

ClassiMap : A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data. / Lespinats, Sylvain; Aupetit, Michael; Meyer-Baese, Anke.

In: International Journal of Pattern Recognition and Artificial Intelligence, Vol. 29, No. 6, 1551008, 14.09.2015.

Research output: Contribution to journalArticle

@article{8a457ee3dfdc4db9979d22d0b334a88b,
title = "ClassiMap: A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data",
abstract = "Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.",
keywords = "dimensionality reduction, distance preservation, exploratory data analysis, labeled data, mapping evaluation, Multidimensional scaling",
author = "Sylvain Lespinats and Michael Aupetit and Anke Meyer-Baese",
year = "2015",
month = "9",
day = "14",
doi = "10.1142/S0218001415510088",
language = "English",
volume = "29",
journal = "International Journal of Pattern Recognition and Artificial Intelligence",
issn = "0218-0014",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "6",

}

TY - JOUR

T1 - ClassiMap

T2 - A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data

AU - Lespinats, Sylvain

AU - Aupetit, Michael

AU - Meyer-Baese, Anke

PY - 2015/9/14

Y1 - 2015/9/14

N2 - Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.

AB - Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.

KW - dimensionality reduction

KW - distance preservation

KW - exploratory data analysis

KW - labeled data

KW - mapping evaluation

KW - Multidimensional scaling

UR - http://www.scopus.com/inward/record.url?scp=84939258526&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939258526&partnerID=8YFLogxK

U2 - 10.1142/S0218001415510088

DO - 10.1142/S0218001415510088

M3 - Article

VL - 29

JO - International Journal of Pattern Recognition and Artificial Intelligence

JF - International Journal of Pattern Recognition and Artificial Intelligence

SN - 0218-0014

IS - 6

M1 - 1551008

ER -