High-dimensional labeled data analysis with topology representing graphs

Michael Aupetit, Thibaud Catz

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

We propose the use of topology representing graphs for the exploratory analysis of high-dimensional labeled data. The Delaunay graph contains all the topological information needed to analyze the topology of the classes (e.g. the number of separate clusters of a given class, the way these clusters are in contact with each other or the shape of these clusters). The Delaunay graph also allows to sample the decision boundary of the Nearest Neighbor rule, to define a topological criterion of non-linear separability of the classes and to find data which are near the decision boundary so that their label must be considered carefully. This graph then provides a way to analyze the complexity of a classification problem, and tools for decision support. When the Delaunay graph is not tractable in too high-dimensional spaces, we propose to use the Gabriel graph instead and discuss the limits of this approach. This analysis technique is complementary with projection techniques, as it allows to handle the data as they are in the data space, avoiding projection distortions. We apply it to analyze the well-known Iris database and a seismic events database.

Original languageEnglish
Pages (from-to)139-169
Number of pages31
JournalNeurocomputing
Volume63
Issue numberSPEC. ISS.
DOIs
Publication statusPublished - Jan 2005
Externally publishedYes

Fingerprint

Topology
Databases
Iris
Systems Analysis
Labels

Keywords

  • Classification
  • Decision boundary
  • Delaunay graph
  • Exploratory data analysis
  • Gabriel graph
  • High-dimensional labeled data
  • Topology representing graph
  • Voronö cell

ASJC Scopus subject areas

  • Artificial Intelligence
  • Cellular and Molecular Neuroscience

Cite this

High-dimensional labeled data analysis with topology representing graphs. / Aupetit, Michael; Catz, Thibaud.

In: Neurocomputing, Vol. 63, No. SPEC. ISS., 01.2005, p. 139-169.

Research output: Contribution to journalArticle

Aupetit, Michael ; Catz, Thibaud. / High-dimensional labeled data analysis with topology representing graphs. In: Neurocomputing. 2005 ; Vol. 63, No. SPEC. ISS. pp. 139-169.
@article{6bed627d548b4f8eb3cf7d0c4654ae83,
title = "High-dimensional labeled data analysis with topology representing graphs",
abstract = "We propose the use of topology representing graphs for the exploratory analysis of high-dimensional labeled data. The Delaunay graph contains all the topological information needed to analyze the topology of the classes (e.g. the number of separate clusters of a given class, the way these clusters are in contact with each other or the shape of these clusters). The Delaunay graph also allows to sample the decision boundary of the Nearest Neighbor rule, to define a topological criterion of non-linear separability of the classes and to find data which are near the decision boundary so that their label must be considered carefully. This graph then provides a way to analyze the complexity of a classification problem, and tools for decision support. When the Delaunay graph is not tractable in too high-dimensional spaces, we propose to use the Gabriel graph instead and discuss the limits of this approach. This analysis technique is complementary with projection techniques, as it allows to handle the data as they are in the data space, avoiding projection distortions. We apply it to analyze the well-known Iris database and a seismic events database.",
keywords = "Classification, Decision boundary, Delaunay graph, Exploratory data analysis, Gabriel graph, High-dimensional labeled data, Topology representing graph, Voron{\"o} cell",
author = "Michael Aupetit and Thibaud Catz",
year = "2005",
month = "1",
doi = "10.1016/j.neucom.2004.04.009",
language = "English",
volume = "63",
pages = "139--169",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",
number = "SPEC. ISS.",

}

TY - JOUR

T1 - High-dimensional labeled data analysis with topology representing graphs

AU - Aupetit, Michael

AU - Catz, Thibaud

PY - 2005/1

Y1 - 2005/1

N2 - We propose the use of topology representing graphs for the exploratory analysis of high-dimensional labeled data. The Delaunay graph contains all the topological information needed to analyze the topology of the classes (e.g. the number of separate clusters of a given class, the way these clusters are in contact with each other or the shape of these clusters). The Delaunay graph also allows to sample the decision boundary of the Nearest Neighbor rule, to define a topological criterion of non-linear separability of the classes and to find data which are near the decision boundary so that their label must be considered carefully. This graph then provides a way to analyze the complexity of a classification problem, and tools for decision support. When the Delaunay graph is not tractable in too high-dimensional spaces, we propose to use the Gabriel graph instead and discuss the limits of this approach. This analysis technique is complementary with projection techniques, as it allows to handle the data as they are in the data space, avoiding projection distortions. We apply it to analyze the well-known Iris database and a seismic events database.

AB - We propose the use of topology representing graphs for the exploratory analysis of high-dimensional labeled data. The Delaunay graph contains all the topological information needed to analyze the topology of the classes (e.g. the number of separate clusters of a given class, the way these clusters are in contact with each other or the shape of these clusters). The Delaunay graph also allows to sample the decision boundary of the Nearest Neighbor rule, to define a topological criterion of non-linear separability of the classes and to find data which are near the decision boundary so that their label must be considered carefully. This graph then provides a way to analyze the complexity of a classification problem, and tools for decision support. When the Delaunay graph is not tractable in too high-dimensional spaces, we propose to use the Gabriel graph instead and discuss the limits of this approach. This analysis technique is complementary with projection techniques, as it allows to handle the data as they are in the data space, avoiding projection distortions. We apply it to analyze the well-known Iris database and a seismic events database.

KW - Classification

KW - Decision boundary

KW - Delaunay graph

KW - Exploratory data analysis

KW - Gabriel graph

KW - High-dimensional labeled data

KW - Topology representing graph

KW - Voronö cell

UR - http://www.scopus.com/inward/record.url?scp=12144251726&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12144251726&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2004.04.009

DO - 10.1016/j.neucom.2004.04.009

M3 - Article

AN - SCOPUS:12144251726

VL - 63

SP - 139

EP - 169

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

IS - SPEC. ISS.

ER -