ISTI@TREC microblog track 2011

Exploring the use of hashtag segmentation and text quality ranking

Giacomo Berardi, Andrea Esuli, Diego Marcheggiani, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the first year of the TREC Micro Blog track, our participation has focused on building from scratch an IR system based on the Whoosh IR library. Though the design of our system (CipCipPy) is pretty standard it includes three ad-hoc solutions for the track: (i) a dedicated indexing function for hashtags that automatically recognizes the distinct words composing an hashtag, (ii) expansion of tweets based on the title of any referred Web page, and (iii) a tweet ranking function that ranks tweets in results by their content quality, which is compared against a reference corpus of Reuters news. In this preliminary paper we describe all the components of our system, and the efficacy scored by our runs. The CipCipPy system is available under a GPL license.

Original languageEnglish
Title of host publicationNIST Special Publication
Publication statusPublished - 2011
Externally publishedYes
Event20th Text REtrieval Conference, TREC 2011 - Gaithersburg, MD, United States
Duration: 15 Nov 201118 Nov 2011

Other

Other20th Text REtrieval Conference, TREC 2011
CountryUnited States
CityGaithersburg, MD
Period15/11/1118/11/11

Fingerprint

Blogs
Websites

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Berardi, G., Esuli, A., Marcheggiani, D., & Sebastiani, F. (2011). ISTI@TREC microblog track 2011: Exploring the use of hashtag segmentation and text quality ranking. In NIST Special Publication

ISTI@TREC microblog track 2011 : Exploring the use of hashtag segmentation and text quality ranking. / Berardi, Giacomo; Esuli, Andrea; Marcheggiani, Diego; Sebastiani, Fabrizio.

NIST Special Publication. 2011.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Berardi, G, Esuli, A, Marcheggiani, D & Sebastiani, F 2011, ISTI@TREC microblog track 2011: Exploring the use of hashtag segmentation and text quality ranking. in NIST Special Publication. 20th Text REtrieval Conference, TREC 2011, Gaithersburg, MD, United States, 15/11/11.
Berardi G, Esuli A, Marcheggiani D, Sebastiani F. ISTI@TREC microblog track 2011: Exploring the use of hashtag segmentation and text quality ranking. In NIST Special Publication. 2011
Berardi, Giacomo ; Esuli, Andrea ; Marcheggiani, Diego ; Sebastiani, Fabrizio. / ISTI@TREC microblog track 2011 : Exploring the use of hashtag segmentation and text quality ranking. NIST Special Publication. 2011.
@inproceedings{7770141e3da54153860d144c63ebe943,
title = "ISTI@TREC microblog track 2011: Exploring the use of hashtag segmentation and text quality ranking",
abstract = "In the first year of the TREC Micro Blog track, our participation has focused on building from scratch an IR system based on the Whoosh IR library. Though the design of our system (CipCipPy) is pretty standard it includes three ad-hoc solutions for the track: (i) a dedicated indexing function for hashtags that automatically recognizes the distinct words composing an hashtag, (ii) expansion of tweets based on the title of any referred Web page, and (iii) a tweet ranking function that ranks tweets in results by their content quality, which is compared against a reference corpus of Reuters news. In this preliminary paper we describe all the components of our system, and the efficacy scored by our runs. The CipCipPy system is available under a GPL license.",
author = "Giacomo Berardi and Andrea Esuli and Diego Marcheggiani and Fabrizio Sebastiani",
year = "2011",
language = "English",
booktitle = "NIST Special Publication",

}

TY - GEN

T1 - ISTI@TREC microblog track 2011

T2 - Exploring the use of hashtag segmentation and text quality ranking

AU - Berardi, Giacomo

AU - Esuli, Andrea

AU - Marcheggiani, Diego

AU - Sebastiani, Fabrizio

PY - 2011

Y1 - 2011

N2 - In the first year of the TREC Micro Blog track, our participation has focused on building from scratch an IR system based on the Whoosh IR library. Though the design of our system (CipCipPy) is pretty standard it includes three ad-hoc solutions for the track: (i) a dedicated indexing function for hashtags that automatically recognizes the distinct words composing an hashtag, (ii) expansion of tweets based on the title of any referred Web page, and (iii) a tweet ranking function that ranks tweets in results by their content quality, which is compared against a reference corpus of Reuters news. In this preliminary paper we describe all the components of our system, and the efficacy scored by our runs. The CipCipPy system is available under a GPL license.

AB - In the first year of the TREC Micro Blog track, our participation has focused on building from scratch an IR system based on the Whoosh IR library. Though the design of our system (CipCipPy) is pretty standard it includes three ad-hoc solutions for the track: (i) a dedicated indexing function for hashtags that automatically recognizes the distinct words composing an hashtag, (ii) expansion of tweets based on the title of any referred Web page, and (iii) a tweet ranking function that ranks tweets in results by their content quality, which is compared against a reference corpus of Reuters news. In this preliminary paper we describe all the components of our system, and the efficacy scored by our runs. The CipCipPy system is available under a GPL license.

UR - http://www.scopus.com/inward/record.url?scp=84873563997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873563997&partnerID=8YFLogxK

M3 - Conference contribution

BT - NIST Special Publication

ER -