Dense vs. Sparse representations for news stream clustering

Research output: Contribution to journalConference article

Abstract

The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF–IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task.

Original languageEnglish
Pages (from-to)47-52
Number of pages6
JournalCEUR Workshop Proceedings
Volume2342
Publication statusPublished - 1 Jan 2019
Event2nd International Workshop on Narrative Extraction From Texts, Text2Story 2019 - Cologne, Germany
Duration: 14 Apr 2019 → …

Keywords

  • Dense representations
  • Sparse representations
  • Stream clustering

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Dense vs. Sparse representations for news stream clustering. / Staykovski, Todor; Barron, Alberto; Martino, Giovanni; Nakov, Preslav.

In: CEUR Workshop Proceedings, Vol. 2342, 01.01.2019, p. 47-52.

Research output: Contribution to journalConference article

@article{a11323f4a021417d9bc228c0ecb48389,
title = "Dense vs. Sparse representations for news stream clustering",
abstract = "The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF–IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task.",
keywords = "Dense representations, Sparse representations, Stream clustering",
author = "Todor Staykovski and Alberto Barron and Giovanni Martino and Preslav Nakov",
year = "2019",
month = "1",
day = "1",
language = "English",
volume = "2342",
pages = "47--52",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - Dense vs. Sparse representations for news stream clustering

AU - Staykovski, Todor

AU - Barron, Alberto

AU - Martino, Giovanni

AU - Nakov, Preslav

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF–IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task.

AB - The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF–IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task.

KW - Dense representations

KW - Sparse representations

KW - Stream clustering

UR - http://www.scopus.com/inward/record.url?scp=85066462684&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066462684&partnerID=8YFLogxK

M3 - Conference article

VL - 2342

SP - 47

EP - 52

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -