On main-memory flushing in microblogs data management systems

Amr Magdy, Rami Alghamdi, Mohamed Mokbel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26-330% while coping up with fast microblog streams of up to 100K microblog/second.

Original languageEnglish
Title of host publication2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages445-456
Number of pages12
ISBN (Electronic)9781509020195
DOIs
Publication statusPublished - 22 Jun 2016
Externally publishedYes
Event32nd IEEE International Conference on Data Engineering, ICDE 2016 - Helsinki, Finland
Duration: 16 May 201620 May 2016

Other

Other32nd IEEE International Conference on Data Engineering, ICDE 2016
CountryFinland
CityHelsinki
Period16/5/1620/5/16

Fingerprint

Information management
Data storage equipment
Scalability
Data management system
Query

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Computer Graphics and Computer-Aided Design
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Cite this

Magdy, A., Alghamdi, R., & Mokbel, M. (2016). On main-memory flushing in microblogs data management systems. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016 (pp. 445-456). [7498261] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDE.2016.7498261

On main-memory flushing in microblogs data management systems. / Magdy, Amr; Alghamdi, Rami; Mokbel, Mohamed.

2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 445-456 7498261.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Magdy, A, Alghamdi, R & Mokbel, M 2016, On main-memory flushing in microblogs data management systems. in 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016., 7498261, Institute of Electrical and Electronics Engineers Inc., pp. 445-456, 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, 16/5/16. https://doi.org/10.1109/ICDE.2016.7498261
Magdy A, Alghamdi R, Mokbel M. On main-memory flushing in microblogs data management systems. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 445-456. 7498261 https://doi.org/10.1109/ICDE.2016.7498261
Magdy, Amr ; Alghamdi, Rami ; Mokbel, Mohamed. / On main-memory flushing in microblogs data management systems. 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 445-456
@inproceedings{a3f0e231fba04314b190d5deab00a7db,
title = "On main-memory flushing in microblogs data management systems",
abstract = "Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26-330{\%} while coping up with fast microblog streams of up to 100K microblog/second.",
author = "Amr Magdy and Rami Alghamdi and Mohamed Mokbel",
year = "2016",
month = "6",
day = "22",
doi = "10.1109/ICDE.2016.7498261",
language = "English",
pages = "445--456",
booktitle = "2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - On main-memory flushing in microblogs data management systems

AU - Magdy, Amr

AU - Alghamdi, Rami

AU - Mokbel, Mohamed

PY - 2016/6/22

Y1 - 2016/6/22

N2 - Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26-330% while coping up with fast microblog streams of up to 100K microblog/second.

AB - Searching microblogs, e.g., tweets and comments, is practically supported through main-memory indexing for scalable data digestion and efficient query evaluation. With continuity and excessive numbers of microblogs, it is infeasible to keep data in main-memory for long periods. Thus, once allocated memory budget is filled, a portion of data is flushed from memory to disk to continuously accommodate newly incoming data. Existing techniques come with either low memory hit ratio due to flushing items regardless of their relevance to incoming queries or significant overhead of tracking individual data items, which limit scalability of microblogs systems in either cases. In this paper, we propose kFlushing policy that exploits popularity of top-k queries in microblogs to smartly select a subset of microblogs to flush. kFlushing is mainly designed to increase memory hit ratio. To this end, it identifies and flushes in-memory data that does not contribute to incoming queries. The freed memory space is utilized to accumulate more useful data that is used to answer more queries from memory contents. When all memory is utilized for useful data, kFlushing flushes data that is less likely to degrade memory hit ratio. In addition, kFlushing comes with a little overhead that keeps high system scalability in terms of high digestion rates of incoming fast data. Extensive experimental evaluation shows the effectiveness and scalability of kFlushing to improve main-memory hit by 26-330% while coping up with fast microblog streams of up to 100K microblog/second.

UR - http://www.scopus.com/inward/record.url?scp=84980320657&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84980320657&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2016.7498261

DO - 10.1109/ICDE.2016.7498261

M3 - Conference contribution

AN - SCOPUS:84980320657

SP - 445

EP - 456

BT - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -