IntoNews

Online news retrieval using closed captions

Roi Blanco, Gianmarco Morales, Fabrizio Silvestri

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

We present IntoNews, a system to match online news articles with spoken news from a television newscasts represented by closed captions. We formalize the news matching problem as two independent tasks: closed captions segmentation and news retrieval. The system segments closed captions by using a windowing scheme: sliding or tumbling window. Next, it uses each segment to build a query by extracting representative terms. The query is used to retrieve previously indexed news articles from a search engine. To detect when a new article should be surfaced, the system compares the set of retrieved articles with the previously retrieved one. The intuition is that if the difference between these sets is large enough, it is likely that the topic of the newscast currently on air has changed and a new article should be displayed to the user. In order to evaluate IntoNews, we build a test collection using data coming from a second screen application and a major online news aggregator. The dataset is manually segmented and annotated by expert assessors, and used as our ground truth. It is freely available for download through the Webscope program.1 Our evaluation is based on a set of novel time-relevance metrics that take into account three different aspects of the problem at hand: precision, timeliness and coverage. We compare our algorithms against the best method previously proposed in literature for this problem. Experiments show the trade-offs involved among precision, timeliness and coverage of the airing news. Our best method is four times more accurate than the baseline.

Original languageEnglish
Pages (from-to)148-152
Number of pages5
JournalInformation Processing and Management
Volume51
Issue number1
DOIs
Publication statusPublished - 2015
Externally publishedYes

Fingerprint

Barreling
Search engines
Television
news
Air
newscast
Experiments
coverage
intuition
News
search engine
television
air
expert
present
experiment
evaluation

Keywords

  • Continuous retrieval
  • IntoNews
  • IntoNow
  • News retrieval
  • Second screen

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Cite this

IntoNews : Online news retrieval using closed captions. / Blanco, Roi; Morales, Gianmarco; Silvestri, Fabrizio.

In: Information Processing and Management, Vol. 51, No. 1, 2015, p. 148-152.

Research output: Contribution to journalArticle

Blanco, Roi ; Morales, Gianmarco ; Silvestri, Fabrizio. / IntoNews : Online news retrieval using closed captions. In: Information Processing and Management. 2015 ; Vol. 51, No. 1. pp. 148-152.
@article{ae25ca78d2274177a6008f3a89ec0ada,
title = "IntoNews: Online news retrieval using closed captions",
abstract = "We present IntoNews, a system to match online news articles with spoken news from a television newscasts represented by closed captions. We formalize the news matching problem as two independent tasks: closed captions segmentation and news retrieval. The system segments closed captions by using a windowing scheme: sliding or tumbling window. Next, it uses each segment to build a query by extracting representative terms. The query is used to retrieve previously indexed news articles from a search engine. To detect when a new article should be surfaced, the system compares the set of retrieved articles with the previously retrieved one. The intuition is that if the difference between these sets is large enough, it is likely that the topic of the newscast currently on air has changed and a new article should be displayed to the user. In order to evaluate IntoNews, we build a test collection using data coming from a second screen application and a major online news aggregator. The dataset is manually segmented and annotated by expert assessors, and used as our ground truth. It is freely available for download through the Webscope program.1 Our evaluation is based on a set of novel time-relevance metrics that take into account three different aspects of the problem at hand: precision, timeliness and coverage. We compare our algorithms against the best method previously proposed in literature for this problem. Experiments show the trade-offs involved among precision, timeliness and coverage of the airing news. Our best method is four times more accurate than the baseline.",
keywords = "Continuous retrieval, IntoNews, IntoNow, News retrieval, Second screen",
author = "Roi Blanco and Gianmarco Morales and Fabrizio Silvestri",
year = "2015",
doi = "10.1016/j.ipm.2014.07.010",
language = "English",
volume = "51",
pages = "148--152",
journal = "Information Processing and Management",
issn = "0306-4573",
publisher = "Elsevier Limited",
number = "1",

}

TY - JOUR

T1 - IntoNews

T2 - Online news retrieval using closed captions

AU - Blanco, Roi

AU - Morales, Gianmarco

AU - Silvestri, Fabrizio

PY - 2015

Y1 - 2015

N2 - We present IntoNews, a system to match online news articles with spoken news from a television newscasts represented by closed captions. We formalize the news matching problem as two independent tasks: closed captions segmentation and news retrieval. The system segments closed captions by using a windowing scheme: sliding or tumbling window. Next, it uses each segment to build a query by extracting representative terms. The query is used to retrieve previously indexed news articles from a search engine. To detect when a new article should be surfaced, the system compares the set of retrieved articles with the previously retrieved one. The intuition is that if the difference between these sets is large enough, it is likely that the topic of the newscast currently on air has changed and a new article should be displayed to the user. In order to evaluate IntoNews, we build a test collection using data coming from a second screen application and a major online news aggregator. The dataset is manually segmented and annotated by expert assessors, and used as our ground truth. It is freely available for download through the Webscope program.1 Our evaluation is based on a set of novel time-relevance metrics that take into account three different aspects of the problem at hand: precision, timeliness and coverage. We compare our algorithms against the best method previously proposed in literature for this problem. Experiments show the trade-offs involved among precision, timeliness and coverage of the airing news. Our best method is four times more accurate than the baseline.

AB - We present IntoNews, a system to match online news articles with spoken news from a television newscasts represented by closed captions. We formalize the news matching problem as two independent tasks: closed captions segmentation and news retrieval. The system segments closed captions by using a windowing scheme: sliding or tumbling window. Next, it uses each segment to build a query by extracting representative terms. The query is used to retrieve previously indexed news articles from a search engine. To detect when a new article should be surfaced, the system compares the set of retrieved articles with the previously retrieved one. The intuition is that if the difference between these sets is large enough, it is likely that the topic of the newscast currently on air has changed and a new article should be displayed to the user. In order to evaluate IntoNews, we build a test collection using data coming from a second screen application and a major online news aggregator. The dataset is manually segmented and annotated by expert assessors, and used as our ground truth. It is freely available for download through the Webscope program.1 Our evaluation is based on a set of novel time-relevance metrics that take into account three different aspects of the problem at hand: precision, timeliness and coverage. We compare our algorithms against the best method previously proposed in literature for this problem. Experiments show the trade-offs involved among precision, timeliness and coverage of the airing news. Our best method is four times more accurate than the baseline.

KW - Continuous retrieval

KW - IntoNews

KW - IntoNow

KW - News retrieval

KW - Second screen

UR - http://www.scopus.com/inward/record.url?scp=84908499679&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908499679&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2014.07.010

DO - 10.1016/j.ipm.2014.07.010

M3 - Article

VL - 51

SP - 148

EP - 152

JO - Information Processing and Management

JF - Information Processing and Management

SN - 0306-4573

IS - 1

ER -