Distributed kernel matrix approximation and implementation using message passing interface

Taher A. Dameh, Wael Abd-Almageed, Mohamed Hefeeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a distributed method to compute similarity (also known as kernel and Gram) matrices used in various kernel-based machine learning algorithms. Current methods for computing similarity matrices have quadratic time and space complexities, which make them not scalable to large-scale data sets. To reduce these quadratic complexities, the proposed method first partitions the data into smaller subsets using various families of locality sensitive hashing, including random project and spectral hashing. Then, the method computes the similarity values among points in the smaller subsets to result in approximated similarity matrices. We analytically show that the time and space complexities of the proposed method are sub quadratic. We implemented the proposed method using the Message Passing Interface (MPI) framework and ran it on a cluster. Our results with real large-scale data sets show that the proposed method does not significantly impact the accuracy of the computed similarity matrices and it achieves substantial savings in running time and memory requirements.

Original languageEnglish
Title of host publicationProceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013
PublisherIEEE Computer Society
Pages52-57
Number of pages6
Volume1
DOIs
Publication statusPublished - 1 Jan 2013
Externally publishedYes
Event2013 12th International Conference on Machine Learning and Applications, ICMLA 2013 - Miami, FL, United States
Duration: 4 Dec 20137 Dec 2013

Other

Other2013 12th International Conference on Machine Learning and Applications, ICMLA 2013
CountryUnited States
CityMiami, FL
Period4/12/137/12/13

Fingerprint

Message passing
Learning algorithms
Learning systems
Data storage equipment

Keywords

  • big data
  • distributed clustering
  • kernel matrix approximation
  • kernel-based algorithms
  • Large-scale data processing

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction

Cite this

Dameh, T. A., Abd-Almageed, W., & Hefeeda, M. (2013). Distributed kernel matrix approximation and implementation using message passing interface. In Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013 (Vol. 1, pp. 52-57). [6784587] IEEE Computer Society. https://doi.org/10.1109/ICMLA.2013.17

Distributed kernel matrix approximation and implementation using message passing interface. / Dameh, Taher A.; Abd-Almageed, Wael; Hefeeda, Mohamed.

Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013. Vol. 1 IEEE Computer Society, 2013. p. 52-57 6784587.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dameh, TA, Abd-Almageed, W & Hefeeda, M 2013, Distributed kernel matrix approximation and implementation using message passing interface. in Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013. vol. 1, 6784587, IEEE Computer Society, pp. 52-57, 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013, Miami, FL, United States, 4/12/13. https://doi.org/10.1109/ICMLA.2013.17
Dameh TA, Abd-Almageed W, Hefeeda M. Distributed kernel matrix approximation and implementation using message passing interface. In Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013. Vol. 1. IEEE Computer Society. 2013. p. 52-57. 6784587 https://doi.org/10.1109/ICMLA.2013.17
Dameh, Taher A. ; Abd-Almageed, Wael ; Hefeeda, Mohamed. / Distributed kernel matrix approximation and implementation using message passing interface. Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013. Vol. 1 IEEE Computer Society, 2013. pp. 52-57
@inproceedings{e62b2a215b72482a8ea26cbf2ae90d22,
title = "Distributed kernel matrix approximation and implementation using message passing interface",
abstract = "We propose a distributed method to compute similarity (also known as kernel and Gram) matrices used in various kernel-based machine learning algorithms. Current methods for computing similarity matrices have quadratic time and space complexities, which make them not scalable to large-scale data sets. To reduce these quadratic complexities, the proposed method first partitions the data into smaller subsets using various families of locality sensitive hashing, including random project and spectral hashing. Then, the method computes the similarity values among points in the smaller subsets to result in approximated similarity matrices. We analytically show that the time and space complexities of the proposed method are sub quadratic. We implemented the proposed method using the Message Passing Interface (MPI) framework and ran it on a cluster. Our results with real large-scale data sets show that the proposed method does not significantly impact the accuracy of the computed similarity matrices and it achieves substantial savings in running time and memory requirements.",
keywords = "big data, distributed clustering, kernel matrix approximation, kernel-based algorithms, Large-scale data processing",
author = "Dameh, {Taher A.} and Wael Abd-Almageed and Mohamed Hefeeda",
year = "2013",
month = "1",
day = "1",
doi = "10.1109/ICMLA.2013.17",
language = "English",
volume = "1",
pages = "52--57",
booktitle = "Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Distributed kernel matrix approximation and implementation using message passing interface

AU - Dameh, Taher A.

AU - Abd-Almageed, Wael

AU - Hefeeda, Mohamed

PY - 2013/1/1

Y1 - 2013/1/1

N2 - We propose a distributed method to compute similarity (also known as kernel and Gram) matrices used in various kernel-based machine learning algorithms. Current methods for computing similarity matrices have quadratic time and space complexities, which make them not scalable to large-scale data sets. To reduce these quadratic complexities, the proposed method first partitions the data into smaller subsets using various families of locality sensitive hashing, including random project and spectral hashing. Then, the method computes the similarity values among points in the smaller subsets to result in approximated similarity matrices. We analytically show that the time and space complexities of the proposed method are sub quadratic. We implemented the proposed method using the Message Passing Interface (MPI) framework and ran it on a cluster. Our results with real large-scale data sets show that the proposed method does not significantly impact the accuracy of the computed similarity matrices and it achieves substantial savings in running time and memory requirements.

AB - We propose a distributed method to compute similarity (also known as kernel and Gram) matrices used in various kernel-based machine learning algorithms. Current methods for computing similarity matrices have quadratic time and space complexities, which make them not scalable to large-scale data sets. To reduce these quadratic complexities, the proposed method first partitions the data into smaller subsets using various families of locality sensitive hashing, including random project and spectral hashing. Then, the method computes the similarity values among points in the smaller subsets to result in approximated similarity matrices. We analytically show that the time and space complexities of the proposed method are sub quadratic. We implemented the proposed method using the Message Passing Interface (MPI) framework and ran it on a cluster. Our results with real large-scale data sets show that the proposed method does not significantly impact the accuracy of the computed similarity matrices and it achieves substantial savings in running time and memory requirements.

KW - big data

KW - distributed clustering

KW - kernel matrix approximation

KW - kernel-based algorithms

KW - Large-scale data processing

UR - http://www.scopus.com/inward/record.url?scp=84899431753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899431753&partnerID=8YFLogxK

U2 - 10.1109/ICMLA.2013.17

DO - 10.1109/ICMLA.2013.17

M3 - Conference contribution

VL - 1

SP - 52

EP - 57

BT - Proceedings - 2013 12th International Conference on Machine Learning and Applications, ICMLA 2013

PB - IEEE Computer Society

ER -