Exploring similarities across high-dimensional datasets

Karlton Sequeira, Mohammed Zaki

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like confidentiality agreements, dataset size, and so forth. However, these sources may be willing to share a condensed model of their datasets. If some substructures of the condensed models of such datasets, from different sources, are found to be unusually similar, policies successfully applied to one may be successfully applied to the others. In this chapter, we propose a framework for constructing condensed models of datasets and algorithms to find similar substructure in pairs of such models. The algorithms are based on the tensor product. We test our framework on pairs of synthetic datasets and compare our algorithms with an existing one. Finally, we apply it to basketball player statistics for two National Basketball Association (NBA) seasons, and to breast cancer datasets. The results are statistically more interesting than results obtained from independent analysis of the datasets.

Original languageEnglish
Title of host publicationResearch and Trends in Data Mining Technologies and Applications
PublisherIGI Global
Pages53-84
Number of pages32
ISBN (Print)9781599042718
DOIs
Publication statusPublished - 1 Dec 2006
Externally publishedYes

Fingerprint

cancer
statistics

ASJC Scopus subject areas

  • Social Sciences(all)

Cite this

Sequeira, K., & Zaki, M. (2006). Exploring similarities across high-dimensional datasets. In Research and Trends in Data Mining Technologies and Applications (pp. 53-84). IGI Global. https://doi.org/10.4018/978-1-59904-271-8.ch003

Exploring similarities across high-dimensional datasets. / Sequeira, Karlton; Zaki, Mohammed.

Research and Trends in Data Mining Technologies and Applications. IGI Global, 2006. p. 53-84.

Research output: Chapter in Book/Report/Conference proceedingChapter

Sequeira, K & Zaki, M 2006, Exploring similarities across high-dimensional datasets. in Research and Trends in Data Mining Technologies and Applications. IGI Global, pp. 53-84. https://doi.org/10.4018/978-1-59904-271-8.ch003
Sequeira K, Zaki M. Exploring similarities across high-dimensional datasets. In Research and Trends in Data Mining Technologies and Applications. IGI Global. 2006. p. 53-84 https://doi.org/10.4018/978-1-59904-271-8.ch003
Sequeira, Karlton ; Zaki, Mohammed. / Exploring similarities across high-dimensional datasets. Research and Trends in Data Mining Technologies and Applications. IGI Global, 2006. pp. 53-84
@inbook{4fa71ea1bfb144b9bb49a2bb9b19b07a,
title = "Exploring similarities across high-dimensional datasets",
abstract = "Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like confidentiality agreements, dataset size, and so forth. However, these sources may be willing to share a condensed model of their datasets. If some substructures of the condensed models of such datasets, from different sources, are found to be unusually similar, policies successfully applied to one may be successfully applied to the others. In this chapter, we propose a framework for constructing condensed models of datasets and algorithms to find similar substructure in pairs of such models. The algorithms are based on the tensor product. We test our framework on pairs of synthetic datasets and compare our algorithms with an existing one. Finally, we apply it to basketball player statistics for two National Basketball Association (NBA) seasons, and to breast cancer datasets. The results are statistically more interesting than results obtained from independent analysis of the datasets.",
author = "Karlton Sequeira and Mohammed Zaki",
year = "2006",
month = "12",
day = "1",
doi = "10.4018/978-1-59904-271-8.ch003",
language = "English",
isbn = "9781599042718",
pages = "53--84",
booktitle = "Research and Trends in Data Mining Technologies and Applications",
publisher = "IGI Global",

}

TY - CHAP

T1 - Exploring similarities across high-dimensional datasets

AU - Sequeira, Karlton

AU - Zaki, Mohammed

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like confidentiality agreements, dataset size, and so forth. However, these sources may be willing to share a condensed model of their datasets. If some substructures of the condensed models of such datasets, from different sources, are found to be unusually similar, policies successfully applied to one may be successfully applied to the others. In this chapter, we propose a framework for constructing condensed models of datasets and algorithms to find similar substructure in pairs of such models. The algorithms are based on the tensor product. We test our framework on pairs of synthetic datasets and compare our algorithms with an existing one. Finally, we apply it to basketball player statistics for two National Basketball Association (NBA) seasons, and to breast cancer datasets. The results are statistically more interesting than results obtained from independent analysis of the datasets.

AB - Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like confidentiality agreements, dataset size, and so forth. However, these sources may be willing to share a condensed model of their datasets. If some substructures of the condensed models of such datasets, from different sources, are found to be unusually similar, policies successfully applied to one may be successfully applied to the others. In this chapter, we propose a framework for constructing condensed models of datasets and algorithms to find similar substructure in pairs of such models. The algorithms are based on the tensor product. We test our framework on pairs of synthetic datasets and compare our algorithms with an existing one. Finally, we apply it to basketball player statistics for two National Basketball Association (NBA) seasons, and to breast cancer datasets. The results are statistically more interesting than results obtained from independent analysis of the datasets.

UR - http://www.scopus.com/inward/record.url?scp=80052763650&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052763650&partnerID=8YFLogxK

U2 - 10.4018/978-1-59904-271-8.ch003

DO - 10.4018/978-1-59904-271-8.ch003

M3 - Chapter

SN - 9781599042718

SP - 53

EP - 84

BT - Research and Trends in Data Mining Technologies and Applications

PB - IGI Global

ER -