DIMO: Distributed index for matching multimedia objects using MapReduce

Ahmed Abdelsadek, Mohamed Hefeeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper presents the design and evaluation of DIMO, a distributed system for matching high-dimensional multimedia objects. DIMO provides multimedia applications with the basic function of computing the K nearest neighbors on large-scale datasets. It also allows multimedia applications to define application-specific functions to further process the computed nearest neighbors. DIMO presents a novel method for partitioning, searching, and storing high-dimensional datasets on distributed infrastructures that support the MapReduce programming model. We have implemented DIMO and extensively evaluated it on Amazon clusters with number of machines ranging from 8 to 128. We have experimented with large datasets of sizes up to 160 million data points extracted from images, and each point has 128 dimensions. Our experimental results show that DIMO: (i) results in high precision when compared against the ground-truth nearest neighbors, (ii) can elastically utilize varying amounts of computing resources, (iii) does not impose high network overheads, (iv) does not require large main memory even for processing large datasets, and (v) balances the load across the used computing machines. In addition, DIMO outperforms the closest system in the literature by a large margin (up to 20%) in terms of the achieved average precision of the computed nearest neighbors. Furthermore, DIMO requires at least three orders of magnitudes less storage than the other system, and it is more computationally efficient.

Original languageEnglish
Title of host publicationProceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014
PublisherAssociation for Computing Machinery
Pages115-126
Number of pages12
DOIs
Publication statusPublished - 1 Jan 2014
Event5th ACM Multimedia Systems Conference, MMSys 2014 - Singapore, Singapore
Duration: 19 Mar 201421 Mar 2014

Other

Other5th ACM Multimedia Systems Conference, MMSys 2014
CountrySingapore
CitySingapore
Period19/3/1421/3/14

Fingerprint

Data storage equipment
Processing

Keywords

  • High dimensional data
  • Large-scale data
  • Multimedia search
  • Nearest neighbors
  • Object matching

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction

Cite this

Abdelsadek, A., & Hefeeda, M. (2014). DIMO: Distributed index for matching multimedia objects using MapReduce. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014 (pp. 115-126). Association for Computing Machinery. https://doi.org/10.1145/2557642.2557650

DIMO : Distributed index for matching multimedia objects using MapReduce. / Abdelsadek, Ahmed; Hefeeda, Mohamed.

Proceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014. Association for Computing Machinery, 2014. p. 115-126.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abdelsadek, A & Hefeeda, M 2014, DIMO: Distributed index for matching multimedia objects using MapReduce. in Proceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014. Association for Computing Machinery, pp. 115-126, 5th ACM Multimedia Systems Conference, MMSys 2014, Singapore, Singapore, 19/3/14. https://doi.org/10.1145/2557642.2557650
Abdelsadek A, Hefeeda M. DIMO: Distributed index for matching multimedia objects using MapReduce. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014. Association for Computing Machinery. 2014. p. 115-126 https://doi.org/10.1145/2557642.2557650
Abdelsadek, Ahmed ; Hefeeda, Mohamed. / DIMO : Distributed index for matching multimedia objects using MapReduce. Proceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014. Association for Computing Machinery, 2014. pp. 115-126
@inproceedings{599c7f444c214c3da73bf765d7add5cf,
title = "DIMO: Distributed index for matching multimedia objects using MapReduce",
abstract = "This paper presents the design and evaluation of DIMO, a distributed system for matching high-dimensional multimedia objects. DIMO provides multimedia applications with the basic function of computing the K nearest neighbors on large-scale datasets. It also allows multimedia applications to define application-specific functions to further process the computed nearest neighbors. DIMO presents a novel method for partitioning, searching, and storing high-dimensional datasets on distributed infrastructures that support the MapReduce programming model. We have implemented DIMO and extensively evaluated it on Amazon clusters with number of machines ranging from 8 to 128. We have experimented with large datasets of sizes up to 160 million data points extracted from images, and each point has 128 dimensions. Our experimental results show that DIMO: (i) results in high precision when compared against the ground-truth nearest neighbors, (ii) can elastically utilize varying amounts of computing resources, (iii) does not impose high network overheads, (iv) does not require large main memory even for processing large datasets, and (v) balances the load across the used computing machines. In addition, DIMO outperforms the closest system in the literature by a large margin (up to 20{\%}) in terms of the achieved average precision of the computed nearest neighbors. Furthermore, DIMO requires at least three orders of magnitudes less storage than the other system, and it is more computationally efficient.",
keywords = "High dimensional data, Large-scale data, Multimedia search, Nearest neighbors, Object matching",
author = "Ahmed Abdelsadek and Mohamed Hefeeda",
year = "2014",
month = "1",
day = "1",
doi = "10.1145/2557642.2557650",
language = "English",
pages = "115--126",
booktitle = "Proceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - DIMO

T2 - Distributed index for matching multimedia objects using MapReduce

AU - Abdelsadek, Ahmed

AU - Hefeeda, Mohamed

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This paper presents the design and evaluation of DIMO, a distributed system for matching high-dimensional multimedia objects. DIMO provides multimedia applications with the basic function of computing the K nearest neighbors on large-scale datasets. It also allows multimedia applications to define application-specific functions to further process the computed nearest neighbors. DIMO presents a novel method for partitioning, searching, and storing high-dimensional datasets on distributed infrastructures that support the MapReduce programming model. We have implemented DIMO and extensively evaluated it on Amazon clusters with number of machines ranging from 8 to 128. We have experimented with large datasets of sizes up to 160 million data points extracted from images, and each point has 128 dimensions. Our experimental results show that DIMO: (i) results in high precision when compared against the ground-truth nearest neighbors, (ii) can elastically utilize varying amounts of computing resources, (iii) does not impose high network overheads, (iv) does not require large main memory even for processing large datasets, and (v) balances the load across the used computing machines. In addition, DIMO outperforms the closest system in the literature by a large margin (up to 20%) in terms of the achieved average precision of the computed nearest neighbors. Furthermore, DIMO requires at least three orders of magnitudes less storage than the other system, and it is more computationally efficient.

AB - This paper presents the design and evaluation of DIMO, a distributed system for matching high-dimensional multimedia objects. DIMO provides multimedia applications with the basic function of computing the K nearest neighbors on large-scale datasets. It also allows multimedia applications to define application-specific functions to further process the computed nearest neighbors. DIMO presents a novel method for partitioning, searching, and storing high-dimensional datasets on distributed infrastructures that support the MapReduce programming model. We have implemented DIMO and extensively evaluated it on Amazon clusters with number of machines ranging from 8 to 128. We have experimented with large datasets of sizes up to 160 million data points extracted from images, and each point has 128 dimensions. Our experimental results show that DIMO: (i) results in high precision when compared against the ground-truth nearest neighbors, (ii) can elastically utilize varying amounts of computing resources, (iii) does not impose high network overheads, (iv) does not require large main memory even for processing large datasets, and (v) balances the load across the used computing machines. In addition, DIMO outperforms the closest system in the literature by a large margin (up to 20%) in terms of the achieved average precision of the computed nearest neighbors. Furthermore, DIMO requires at least three orders of magnitudes less storage than the other system, and it is more computationally efficient.

KW - High dimensional data

KW - Large-scale data

KW - Multimedia search

KW - Nearest neighbors

KW - Object matching

UR - http://www.scopus.com/inward/record.url?scp=84898970550&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898970550&partnerID=8YFLogxK

U2 - 10.1145/2557642.2557650

DO - 10.1145/2557642.2557650

M3 - Conference contribution

AN - SCOPUS:84898970550

SP - 115

EP - 126

BT - Proceedings of the 5th ACM Multimedia Systems Conference, MMSys 2014

PB - Association for Computing Machinery

ER -