Exploiting geo-tagged tweets to understand localized language diversity

Amr Magdy, Thanaa M. Ghanem, Mashaal Musleh, Mohamed Mokbel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance within the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.

Original languageEnglish
Title of host publication1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014
PublisherAssociation for Computing Machinery
Pages7-12
Number of pages6
ISBN (Print)9781450329781
DOIs
Publication statusPublished - 1 Jan 2014
Externally publishedYes
Event1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014 - Snowbird, UT, United States
Duration: 27 Jun 201427 Jun 2014

Other

Other1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014
CountryUnited States
CitySnowbird, UT
Period27/6/1427/6/14

Fingerprint

Spatial distribution
Disasters

ASJC Scopus subject areas

  • Software

Cite this

Magdy, A., Ghanem, T. M., Musleh, M., & Mokbel, M. (2014). Exploiting geo-tagged tweets to understand localized language diversity. In 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014 (pp. 7-12). Association for Computing Machinery. https://doi.org/10.1145/2619112.2619114

Exploiting geo-tagged tweets to understand localized language diversity. / Magdy, Amr; Ghanem, Thanaa M.; Musleh, Mashaal; Mokbel, Mohamed.

1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014. Association for Computing Machinery, 2014. p. 7-12.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Magdy, A, Ghanem, TM, Musleh, M & Mokbel, M 2014, Exploiting geo-tagged tweets to understand localized language diversity. in 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014. Association for Computing Machinery, pp. 7-12, 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014, Snowbird, UT, United States, 27/6/14. https://doi.org/10.1145/2619112.2619114
Magdy A, Ghanem TM, Musleh M, Mokbel M. Exploiting geo-tagged tweets to understand localized language diversity. In 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014. Association for Computing Machinery. 2014. p. 7-12 https://doi.org/10.1145/2619112.2619114
Magdy, Amr ; Ghanem, Thanaa M. ; Musleh, Mashaal ; Mokbel, Mohamed. / Exploiting geo-tagged tweets to understand localized language diversity. 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014. Association for Computing Machinery, 2014. pp. 7-12
@inproceedings{91de1a2a3bf644df870d1b0403d70407,
title = "Exploiting geo-tagged tweets to understand localized language diversity",
abstract = "Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance within the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.",
author = "Amr Magdy and Ghanem, {Thanaa M.} and Mashaal Musleh and Mohamed Mokbel",
year = "2014",
month = "1",
day = "1",
doi = "10.1145/2619112.2619114",
language = "English",
isbn = "9781450329781",
pages = "7--12",
booktitle = "1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Exploiting geo-tagged tweets to understand localized language diversity

AU - Magdy, Amr

AU - Ghanem, Thanaa M.

AU - Musleh, Mashaal

AU - Mokbel, Mohamed

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance within the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.

AB - Social media services are the top-growing online communities in the last few years. Among those, Twitter becomes the de facto of microblogging services with millions of tweets posted everyday. In this paper, we present an analytical study for localized language usage and diversity in Twitter data using a half billion geotagged tweets. We first identify local Twitter communities on a country-level. For the identified communities, we examine (1) the language diversity, (2) the language dominance within the community and how this differs from local to global views, (3) demographics representativeness of tweets for real population demographics, and (4) the spatial distribution of different cultural groups within the countries. To this end, we group the tweets on two levels. First, we group tweets per country to identify the local communities. Second, we group tweets within each local community based on the tweet language. Our study shows useful insights about language usage on Twitter which provide important information for language-based applications on top of Twitter data, e.g., lingual analysis and disaster management. In addition, we present an interactive exploration tool for the spatial distribution of cultural groups, which provides a low-effort and high-precision localization of different cultural groups inside a certain country.

UR - http://www.scopus.com/inward/record.url?scp=84907016474&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907016474&partnerID=8YFLogxK

U2 - 10.1145/2619112.2619114

DO - 10.1145/2619112.2619114

M3 - Conference contribution

AN - SCOPUS:84907016474

SN - 9781450329781

SP - 7

EP - 12

BT - 1st International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2014 - In Conjunction with SIGMOD 2014

PB - Association for Computing Machinery

ER -