Social content matching in MapReduce

Gianmarco Morales, Aristides Gionis, Mauro Sozio

Research output: Contribution to journalArticle

30 Citations (Scopus)

Abstract

Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.

Original languageEnglish
Pages (from-to)460-469
Number of pages10
JournalProceedings of the VLDB Endowment
Volume4
Issue number7
Publication statusPublished - Apr 2011
Externally publishedYes

Fingerprint

Websites
Marketing
Internet
Personnel
Economics

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Social content matching in MapReduce. / Morales, Gianmarco; Gionis, Aristides; Sozio, Mauro.

In: Proceedings of the VLDB Endowment, Vol. 4, No. 7, 04.2011, p. 460-469.

Research output: Contribution to journalArticle

Morales, G, Gionis, A & Sozio, M 2011, 'Social content matching in MapReduce', Proceedings of the VLDB Endowment, vol. 4, no. 7, pp. 460-469.
Morales, Gianmarco ; Gionis, Aristides ; Sozio, Mauro. / Social content matching in MapReduce. In: Proceedings of the VLDB Endowment. 2011 ; Vol. 4, No. 7. pp. 460-469.
@article{9c44dfd49a6a4f1399df0d9f12e9baf7,
title = "Social content matching in MapReduce",
abstract = "Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.",
author = "Gianmarco Morales and Aristides Gionis and Mauro Sozio",
year = "2011",
month = "4",
language = "English",
volume = "4",
pages = "460--469",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "7",

}

TY - JOUR

T1 - Social content matching in MapReduce

AU - Morales, Gianmarco

AU - Gionis, Aristides

AU - Sozio, Mauro

PY - 2011/4

Y1 - 2011/4

N2 - Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.

AB - Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.

UR - http://www.scopus.com/inward/record.url?scp=84863551232&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863551232&partnerID=8YFLogxK

M3 - Article

VL - 4

SP - 460

EP - 469

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 7

ER -