Representative subsets for big data learning using k-NN graphs

RaghvenPhDa Mall, Vilen Jumutc, Rocco Langone, Johan A.K. Suykens

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

In this paper we propose a deterministic method to obtain subsets from big data which are a good representative of the inherent structure in the data. We first convert the large scale dataset into a sparse undirected k-NN graph using a distributed network generation framework that we propose in this paper. After obtaining the k-NN graph we exploit the fast and unique representative subset (FURS) selection method [1], [2] to deterministically obtain a subset for this big data network. The FURS selection technique selects nodes from different dense regions in the graph retaining the natural community structure. We then locate the points in the original big data corresponding to the selected nodes and compare the obtained subset with subsets acquired from state-of-the-art subset selection techniques. We evaluate the quality of the selected subset on several synthetic and real-life datasets for different learning tasks including big data classification and big data clustering.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages37-42
Number of pages6
ISBN (Electronic)9781479956654
DOIs
Publication statusPublished - 7 Jan 2015
Externally publishedYes
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington
Duration: 27 Oct 201430 Oct 2014

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
CityWashington
Period27/10/1430/10/14

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Cite this

Mall, R., Jumutc, V., Langone, R., & Suykens, J. A. K. (2015). Representative subsets for big data learning using k-NN graphs. In Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014 (pp. 37-42). [7004210] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2014.7004210