Dense vs. Sparse representations for news stream clustering

Research output: Contribution to journalConference article

Abstract

The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF–IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task.

Original languageEnglish
Pages (from-to)47-52
Number of pages6
JournalCEUR Workshop Proceedings
Volume2342
Publication statusPublished - 1 Jan 2019
Event2nd International Workshop on Narrative Extraction From Texts, Text2Story 2019 - Cologne, Germany
Duration: 14 Apr 2019 → …

Keywords

  • Dense representations
  • Sparse representations
  • Stream clustering

ASJC Scopus subject areas

  • Computer Science(all)

Cite this