Exploring tweets normalization and query time sensitivity for twitter search

Zhongyu Wei, Wei Gao, Lanjun Zhou, Binyang Li, Kam Fai Wong

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

This chapter presents our work for the Realtime Adhoc Task of TREC 2011 Microblog Track. Microblog texts like tweets are generally characterized by the inclusion of a large proportion of irregular expressions, such as ill-formed words, which can lead to significant mismatch between query terms and tweets. In addition, Twitter queries are distinguished from Web queries with many unique characteristics, one of which reflects the clearly distinct temporal aspects of Twitter search behavior. In this study, we deal with the first problem by normalizing tweet texts and the second by capturing the temporal characteristics of a topic. We divided topics into two categories: time-sensitive and time-insensitive. For the time-sensitive ones, we introduce a decay factor to adjust the relevance score of results according to the expected date of the topical event to happen, and then re-rank the search results. Experiments demonstrate that our methods are significantly better than baseline and outperform the medium of all runs.

Original languageEnglish
Title of host publicationSocial Media Content Analysis
Subtitle of host publicationNatural Language Processing and Beyond
PublisherWorld Scientific Publishing Co. Pte Ltd
Pages31-44
Number of pages14
ISBN (Electronic)9789813223615
ISBN (Print)9789813223608
DOIs
Publication statusPublished - 1 Jan 2017

Fingerprint

Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Wei, Z., Gao, W., Zhou, L., Li, B., & Wong, K. F. (2017). Exploring tweets normalization and query time sensitivity for twitter search. In Social Media Content Analysis: Natural Language Processing and Beyond (pp. 31-44). World Scientific Publishing Co. Pte Ltd. https://doi.org/10.1142/9789813223615_0003

Exploring tweets normalization and query time sensitivity for twitter search. / Wei, Zhongyu; Gao, Wei; Zhou, Lanjun; Li, Binyang; Wong, Kam Fai.

Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd, 2017. p. 31-44.

Research output: Chapter in Book/Report/Conference proceedingChapter

Wei, Z, Gao, W, Zhou, L, Li, B & Wong, KF 2017, Exploring tweets normalization and query time sensitivity for twitter search. in Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd, pp. 31-44. https://doi.org/10.1142/9789813223615_0003
Wei Z, Gao W, Zhou L, Li B, Wong KF. Exploring tweets normalization and query time sensitivity for twitter search. In Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd. 2017. p. 31-44 https://doi.org/10.1142/9789813223615_0003
Wei, Zhongyu ; Gao, Wei ; Zhou, Lanjun ; Li, Binyang ; Wong, Kam Fai. / Exploring tweets normalization and query time sensitivity for twitter search. Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd, 2017. pp. 31-44
@inbook{cc621e13d63749c6bb0f8cb00d4f2ca1,
title = "Exploring tweets normalization and query time sensitivity for twitter search",
abstract = "This chapter presents our work for the Realtime Adhoc Task of TREC 2011 Microblog Track. Microblog texts like tweets are generally characterized by the inclusion of a large proportion of irregular expressions, such as ill-formed words, which can lead to significant mismatch between query terms and tweets. In addition, Twitter queries are distinguished from Web queries with many unique characteristics, one of which reflects the clearly distinct temporal aspects of Twitter search behavior. In this study, we deal with the first problem by normalizing tweet texts and the second by capturing the temporal characteristics of a topic. We divided topics into two categories: time-sensitive and time-insensitive. For the time-sensitive ones, we introduce a decay factor to adjust the relevance score of results according to the expected date of the topical event to happen, and then re-rank the search results. Experiments demonstrate that our methods are significantly better than baseline and outperform the medium of all runs.",
author = "Zhongyu Wei and Wei Gao and Lanjun Zhou and Binyang Li and Wong, {Kam Fai}",
year = "2017",
month = "1",
day = "1",
doi = "10.1142/9789813223615_0003",
language = "English",
isbn = "9789813223608",
pages = "31--44",
booktitle = "Social Media Content Analysis",
publisher = "World Scientific Publishing Co. Pte Ltd",
address = "Singapore",

}

TY - CHAP

T1 - Exploring tweets normalization and query time sensitivity for twitter search

AU - Wei, Zhongyu

AU - Gao, Wei

AU - Zhou, Lanjun

AU - Li, Binyang

AU - Wong, Kam Fai

PY - 2017/1/1

Y1 - 2017/1/1

N2 - This chapter presents our work for the Realtime Adhoc Task of TREC 2011 Microblog Track. Microblog texts like tweets are generally characterized by the inclusion of a large proportion of irregular expressions, such as ill-formed words, which can lead to significant mismatch between query terms and tweets. In addition, Twitter queries are distinguished from Web queries with many unique characteristics, one of which reflects the clearly distinct temporal aspects of Twitter search behavior. In this study, we deal with the first problem by normalizing tweet texts and the second by capturing the temporal characteristics of a topic. We divided topics into two categories: time-sensitive and time-insensitive. For the time-sensitive ones, we introduce a decay factor to adjust the relevance score of results according to the expected date of the topical event to happen, and then re-rank the search results. Experiments demonstrate that our methods are significantly better than baseline and outperform the medium of all runs.

AB - This chapter presents our work for the Realtime Adhoc Task of TREC 2011 Microblog Track. Microblog texts like tweets are generally characterized by the inclusion of a large proportion of irregular expressions, such as ill-formed words, which can lead to significant mismatch between query terms and tweets. In addition, Twitter queries are distinguished from Web queries with many unique characteristics, one of which reflects the clearly distinct temporal aspects of Twitter search behavior. In this study, we deal with the first problem by normalizing tweet texts and the second by capturing the temporal characteristics of a topic. We divided topics into two categories: time-sensitive and time-insensitive. For the time-sensitive ones, we introduce a decay factor to adjust the relevance score of results according to the expected date of the topical event to happen, and then re-rank the search results. Experiments demonstrate that our methods are significantly better than baseline and outperform the medium of all runs.

UR - http://www.scopus.com/inward/record.url?scp=85041600176&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041600176&partnerID=8YFLogxK

U2 - 10.1142/9789813223615_0003

DO - 10.1142/9789813223615_0003

M3 - Chapter

SN - 9789813223608

SP - 31

EP - 44

BT - Social Media Content Analysis

PB - World Scientific Publishing Co. Pte Ltd

ER -