The little engine(s) that could

Scaling online social networks

Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang, Nikos Laoutaris, Parminder Chhabra, Pablo Rodriguez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Citations (Scopus)

Abstract

The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers - e.g. using DHTs - can lead to costly inter-server communication. We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost. We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.

Original languageEnglish
Title of host publicationComputer Communication Review
Pages375-386
Number of pages12
Volume40
Edition4
DOIs
Publication statusPublished - 2010
Externally publishedYes

Fingerprint

Servers
Engines
Scalability
Systems analysis
Redundancy
Semantics
Throughput
Data storage equipment
Communication
Costs

Keywords

  • Partition
  • Replication
  • Scalability
  • Social networks

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Pujol, J. M., Erramilli, V., Siganos, G., Yang, X., Laoutaris, N., Chhabra, P., & Rodriguez, P. (2010). The little engine(s) that could: Scaling online social networks. In Computer Communication Review (4 ed., Vol. 40, pp. 375-386) https://doi.org/10.1145/1851275.1851227

The little engine(s) that could : Scaling online social networks. / Pujol, Josep M.; Erramilli, Vijay; Siganos, Georgos; Yang, Xiaoyuan; Laoutaris, Nikos; Chhabra, Parminder; Rodriguez, Pablo.

Computer Communication Review. Vol. 40 4. ed. 2010. p. 375-386.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pujol, JM, Erramilli, V, Siganos, G, Yang, X, Laoutaris, N, Chhabra, P & Rodriguez, P 2010, The little engine(s) that could: Scaling online social networks. in Computer Communication Review. 4 edn, vol. 40, pp. 375-386. https://doi.org/10.1145/1851275.1851227
Pujol JM, Erramilli V, Siganos G, Yang X, Laoutaris N, Chhabra P et al. The little engine(s) that could: Scaling online social networks. In Computer Communication Review. 4 ed. Vol. 40. 2010. p. 375-386 https://doi.org/10.1145/1851275.1851227
Pujol, Josep M. ; Erramilli, Vijay ; Siganos, Georgos ; Yang, Xiaoyuan ; Laoutaris, Nikos ; Chhabra, Parminder ; Rodriguez, Pablo. / The little engine(s) that could : Scaling online social networks. Computer Communication Review. Vol. 40 4. ed. 2010. pp. 375-386
@inproceedings{37aed26ba0a94d7598fa47cca46b0631,
title = "The little engine(s) that could: Scaling online social networks",
abstract = "The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers - e.g. using DHTs - can lead to costly inter-server communication. We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost. We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.",
keywords = "Partition, Replication, Scalability, Social networks",
author = "Pujol, {Josep M.} and Vijay Erramilli and Georgos Siganos and Xiaoyuan Yang and Nikos Laoutaris and Parminder Chhabra and Pablo Rodriguez",
year = "2010",
doi = "10.1145/1851275.1851227",
language = "English",
volume = "40",
pages = "375--386",
booktitle = "Computer Communication Review",
edition = "4",

}

TY - GEN

T1 - The little engine(s) that could

T2 - Scaling online social networks

AU - Pujol, Josep M.

AU - Erramilli, Vijay

AU - Siganos, Georgos

AU - Yang, Xiaoyuan

AU - Laoutaris, Nikos

AU - Chhabra, Parminder

AU - Rodriguez, Pablo

PY - 2010

Y1 - 2010

N2 - The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers - e.g. using DHTs - can lead to costly inter-server communication. We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost. We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.

AB - The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers - e.g. using DHTs - can lead to costly inter-server communication. We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost. We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.

KW - Partition

KW - Replication

KW - Scalability

KW - Social networks

UR - http://www.scopus.com/inward/record.url?scp=84860738074&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860738074&partnerID=8YFLogxK

U2 - 10.1145/1851275.1851227

DO - 10.1145/1851275.1851227

M3 - Conference contribution

VL - 40

SP - 375

EP - 386

BT - Computer Communication Review

ER -