Representative subsets for big data learning using k-NN graphs

RaghvenPhDa Mall, Vilen Jumutc, Rocco Langone, Johan A.K. Suykens

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

In this paper we propose a deterministic method to obtain subsets from big data which are a good representative of the inherent structure in the data. We first convert the large scale dataset into a sparse undirected k-NN graph using a distributed network generation framework that we propose in this paper. After obtaining the k-NN graph we exploit the fast and unique representative subset (FURS) selection method [1], [2] to deterministically obtain a subset for this big data network. The FURS selection technique selects nodes from different dense regions in the graph retaining the natural community structure. We then locate the points in the original big data corresponding to the selected nodes and compare the obtained subset with subsets acquired from state-of-the-art subset selection techniques. We evaluate the quality of the selected subset on several synthetic and real-life datasets for different learning tasks including big data classification and big data clustering.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages37-42
Number of pages6
ISBN (Electronic)9781479956654
DOIs
Publication statusPublished - 7 Jan 2015
Externally publishedYes
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington
Duration: 27 Oct 201430 Oct 2014

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
CityWashington
Period27/10/1430/10/14

Fingerprint

Big data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Cite this

Mall, R., Jumutc, V., Langone, R., & Suykens, J. A. K. (2015). Representative subsets for big data learning using k-NN graphs. In Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014 (pp. 37-42). [7004210] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2014.7004210

Representative subsets for big data learning using k-NN graphs. / Mall, RaghvenPhDa; Jumutc, Vilen; Langone, Rocco; Suykens, Johan A.K.

Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014. Institute of Electrical and Electronics Engineers Inc., 2015. p. 37-42 7004210.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mall, R, Jumutc, V, Langone, R & Suykens, JAK 2015, Representative subsets for big data learning using k-NN graphs. in Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014., 7004210, Institute of Electrical and Electronics Engineers Inc., pp. 37-42, 2nd IEEE International Conference on Big Data, IEEE Big Data 2014, Washington, 27/10/14. https://doi.org/10.1109/BigData.2014.7004210
Mall R, Jumutc V, Langone R, Suykens JAK. Representative subsets for big data learning using k-NN graphs. In Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014. Institute of Electrical and Electronics Engineers Inc. 2015. p. 37-42. 7004210 https://doi.org/10.1109/BigData.2014.7004210
Mall, RaghvenPhDa ; Jumutc, Vilen ; Langone, Rocco ; Suykens, Johan A.K. / Representative subsets for big data learning using k-NN graphs. Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 37-42
@inproceedings{629f076b093a4d6b9b52f12ebff7600e,
title = "Representative subsets for big data learning using k-NN graphs",
abstract = "In this paper we propose a deterministic method to obtain subsets from big data which are a good representative of the inherent structure in the data. We first convert the large scale dataset into a sparse undirected k-NN graph using a distributed network generation framework that we propose in this paper. After obtaining the k-NN graph we exploit the fast and unique representative subset (FURS) selection method [1], [2] to deterministically obtain a subset for this big data network. The FURS selection technique selects nodes from different dense regions in the graph retaining the natural community structure. We then locate the points in the original big data corresponding to the selected nodes and compare the obtained subset with subsets acquired from state-of-the-art subset selection techniques. We evaluate the quality of the selected subset on several synthetic and real-life datasets for different learning tasks including big data classification and big data clustering.",
author = "RaghvenPhDa Mall and Vilen Jumutc and Rocco Langone and Suykens, {Johan A.K.}",
year = "2015",
month = "1",
day = "7",
doi = "10.1109/BigData.2014.7004210",
language = "English",
pages = "37--42",
booktitle = "Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Representative subsets for big data learning using k-NN graphs

AU - Mall, RaghvenPhDa

AU - Jumutc, Vilen

AU - Langone, Rocco

AU - Suykens, Johan A.K.

PY - 2015/1/7

Y1 - 2015/1/7

N2 - In this paper we propose a deterministic method to obtain subsets from big data which are a good representative of the inherent structure in the data. We first convert the large scale dataset into a sparse undirected k-NN graph using a distributed network generation framework that we propose in this paper. After obtaining the k-NN graph we exploit the fast and unique representative subset (FURS) selection method [1], [2] to deterministically obtain a subset for this big data network. The FURS selection technique selects nodes from different dense regions in the graph retaining the natural community structure. We then locate the points in the original big data corresponding to the selected nodes and compare the obtained subset with subsets acquired from state-of-the-art subset selection techniques. We evaluate the quality of the selected subset on several synthetic and real-life datasets for different learning tasks including big data classification and big data clustering.

AB - In this paper we propose a deterministic method to obtain subsets from big data which are a good representative of the inherent structure in the data. We first convert the large scale dataset into a sparse undirected k-NN graph using a distributed network generation framework that we propose in this paper. After obtaining the k-NN graph we exploit the fast and unique representative subset (FURS) selection method [1], [2] to deterministically obtain a subset for this big data network. The FURS selection technique selects nodes from different dense regions in the graph retaining the natural community structure. We then locate the points in the original big data corresponding to the selected nodes and compare the obtained subset with subsets acquired from state-of-the-art subset selection techniques. We evaluate the quality of the selected subset on several synthetic and real-life datasets for different learning tasks including big data classification and big data clustering.

UR - http://www.scopus.com/inward/record.url?scp=84921727996&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921727996&partnerID=8YFLogxK

U2 - 10.1109/BigData.2014.7004210

DO - 10.1109/BigData.2014.7004210

M3 - Conference contribution

SP - 37

EP - 42

BT - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -