Local trend discovery on real-time microblogs with uncertain locations in tight memory environments

Abdulaziz Almaslukh, Amr Magdy, Ahmed M. Aly, Mohamed F. Mokbel, Sameh Elnikety, Yuxiong He, Suman Nath, Walid G. Aref

Research output: Contribution to journalArticle

Abstract

This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant portion has uncertain geolocations. GeoTrend+ distinguishes itself from existing techniques in different aspects: (1) Discovering trends in arbitrary spatial regions, e.g., city blocks. (2) Considering both exact geolocations, e.g., accurate latitude/longitude coordinates, and uncertain geolocations, e.g., district-level or city-level, that represents a significant portion of past years microblogs. (3) Promoting recent microblogs as first-class citizens and optimizes different components to digest a continuous flow of fast data in main-memory while removing old data efficiently. (4) Providing various main-memory optimization techniques that are able to distinguish useful from useless data to effectively utilize tight memory resources while maintaining accurate query results on relatively large amounts of data. (5) Supporting various trending measures that effectively capture trending items under a variety of definitions that suit different applications. GeoTrend+ limits its scope to real-time data that is posted during the last T time units. To support its queries efficiently, GeoTrend+ employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In peak times, the main-memory optimization techniques are employed to shed less important data to sustain high query accuracy with limited memory resources. Experimental results based on real data and queries show the scalability of GeoTrend+ to support high arrival rates and low query response time, and at least 90+% query accuracy even under limited memory resources.

Original languageEnglish
JournalGeoInformatica
DOIs
Publication statusAccepted/In press - 1 Jan 2019

Fingerprint

Data storage equipment
trend
resource
resources
time
Scalability
district
citizen

Keywords

  • Adaptive memory optimization
  • Indexing
  • Microblogs
  • Query processing
  • Real-time
  • Spatial
  • Trend
  • Uncertain location
  • Uncertainty

ASJC Scopus subject areas

  • Information Systems
  • Geography, Planning and Development

Cite this

Local trend discovery on real-time microblogs with uncertain locations in tight memory environments. / Almaslukh, Abdulaziz; Magdy, Amr; Aly, Ahmed M.; Mokbel, Mohamed F.; Elnikety, Sameh; He, Yuxiong; Nath, Suman; Aref, Walid G.

In: GeoInformatica, 01.01.2019.

Research output: Contribution to journalArticle

Almaslukh, Abdulaziz ; Magdy, Amr ; Aly, Ahmed M. ; Mokbel, Mohamed F. ; Elnikety, Sameh ; He, Yuxiong ; Nath, Suman ; Aref, Walid G. / Local trend discovery on real-time microblogs with uncertain locations in tight memory environments. In: GeoInformatica. 2019.
@article{e425ff8ae3214c98ba8774b507646dca,
title = "Local trend discovery on real-time microblogs with uncertain locations in tight memory environments",
abstract = "This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant portion has uncertain geolocations. GeoTrend+ distinguishes itself from existing techniques in different aspects: (1) Discovering trends in arbitrary spatial regions, e.g., city blocks. (2) Considering both exact geolocations, e.g., accurate latitude/longitude coordinates, and uncertain geolocations, e.g., district-level or city-level, that represents a significant portion of past years microblogs. (3) Promoting recent microblogs as first-class citizens and optimizes different components to digest a continuous flow of fast data in main-memory while removing old data efficiently. (4) Providing various main-memory optimization techniques that are able to distinguish useful from useless data to effectively utilize tight memory resources while maintaining accurate query results on relatively large amounts of data. (5) Supporting various trending measures that effectively capture trending items under a variety of definitions that suit different applications. GeoTrend+ limits its scope to real-time data that is posted during the last T time units. To support its queries efficiently, GeoTrend+ employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In peak times, the main-memory optimization techniques are employed to shed less important data to sustain high query accuracy with limited memory resources. Experimental results based on real data and queries show the scalability of GeoTrend+ to support high arrival rates and low query response time, and at least 90+{\%} query accuracy even under limited memory resources.",
keywords = "Adaptive memory optimization, Indexing, Microblogs, Query processing, Real-time, Spatial, Trend, Uncertain location, Uncertainty",
author = "Abdulaziz Almaslukh and Amr Magdy and Aly, {Ahmed M.} and Mokbel, {Mohamed F.} and Sameh Elnikety and Yuxiong He and Suman Nath and Aref, {Walid G.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s10707-019-00380-z",
language = "English",
journal = "GeoInformatica",
issn = "1384-6175",
publisher = "Kluwer Academic Publishers",

}

TY - JOUR

T1 - Local trend discovery on real-time microblogs with uncertain locations in tight memory environments

AU - Almaslukh, Abdulaziz

AU - Magdy, Amr

AU - Aly, Ahmed M.

AU - Mokbel, Mohamed F.

AU - Elnikety, Sameh

AU - He, Yuxiong

AU - Nath, Suman

AU - Aref, Walid G.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant portion has uncertain geolocations. GeoTrend+ distinguishes itself from existing techniques in different aspects: (1) Discovering trends in arbitrary spatial regions, e.g., city blocks. (2) Considering both exact geolocations, e.g., accurate latitude/longitude coordinates, and uncertain geolocations, e.g., district-level or city-level, that represents a significant portion of past years microblogs. (3) Promoting recent microblogs as first-class citizens and optimizes different components to digest a continuous flow of fast data in main-memory while removing old data efficiently. (4) Providing various main-memory optimization techniques that are able to distinguish useful from useless data to effectively utilize tight memory resources while maintaining accurate query results on relatively large amounts of data. (5) Supporting various trending measures that effectively capture trending items under a variety of definitions that suit different applications. GeoTrend+ limits its scope to real-time data that is posted during the last T time units. To support its queries efficiently, GeoTrend+ employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In peak times, the main-memory optimization techniques are employed to shed less important data to sustain high query accuracy with limited memory resources. Experimental results based on real data and queries show the scalability of GeoTrend+ to support high arrival rates and low query response time, and at least 90+% query accuracy even under limited memory resources.

AB - This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant portion has uncertain geolocations. GeoTrend+ distinguishes itself from existing techniques in different aspects: (1) Discovering trends in arbitrary spatial regions, e.g., city blocks. (2) Considering both exact geolocations, e.g., accurate latitude/longitude coordinates, and uncertain geolocations, e.g., district-level or city-level, that represents a significant portion of past years microblogs. (3) Promoting recent microblogs as first-class citizens and optimizes different components to digest a continuous flow of fast data in main-memory while removing old data efficiently. (4) Providing various main-memory optimization techniques that are able to distinguish useful from useless data to effectively utilize tight memory resources while maintaining accurate query results on relatively large amounts of data. (5) Supporting various trending measures that effectively capture trending items under a variety of definitions that suit different applications. GeoTrend+ limits its scope to real-time data that is posted during the last T time units. To support its queries efficiently, GeoTrend+ employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In peak times, the main-memory optimization techniques are employed to shed less important data to sustain high query accuracy with limited memory resources. Experimental results based on real data and queries show the scalability of GeoTrend+ to support high arrival rates and low query response time, and at least 90+% query accuracy even under limited memory resources.

KW - Adaptive memory optimization

KW - Indexing

KW - Microblogs

KW - Query processing

KW - Real-time

KW - Spatial

KW - Trend

KW - Uncertain location

KW - Uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85073958811&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073958811&partnerID=8YFLogxK

U2 - 10.1007/s10707-019-00380-z

DO - 10.1007/s10707-019-00380-z

M3 - Article

AN - SCOPUS:85073958811

JO - GeoInformatica

JF - GeoInformatica

SN - 1384-6175

ER -