Proppy: Organizing the news based on their propagandistic content

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Propaganda is a mechanism to influence public opinion, which is inherently present in extremely biased and fake news. Here, we propose a model to automatically assess the level of propagandistic content in an article based on different representations, from writing style and readability level to the presence of certain keywords. We experiment thoroughly with different variations of such a model on a new publicly available corpus, and we show that character n-grams and other style features outperform existing alternatives to identify propaganda based on word n-grams. Unlike previous work, we make sure that the test data comes from news sources that were unseen on training, thus penalizing learning algorithms that model the news sources used at training time as opposed to solving the actual task. We integrate our supervised model in a public website, which organizes recent articles covering the same event on the basis of their propagandistic contents. This allows users to quickly explore different perspectives of the same story, and it also enables investigative journalists to dig further into how different media use stories and propaganda to pursue their agenda.

Original languageEnglish
Pages (from-to)1849-1864
Number of pages16
JournalInformation Processing and Management
Volume56
Issue number5
DOIs
Publication statusPublished - Sep 2019

    Fingerprint

Keywords

  • Investigative journalism
  • News bias
  • Propaganda detection

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Cite this