Utilizing microblogs for automatic news highlights extraction

Zhongyu Wei, Wei Gao

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Story highlights form a succinct single-document summary consisting of 3-4 highlight sentences that reflect the gist of a news article. Automatically producing news highlights is very challenging. We propose a novel method to improve news highlights extraction by using microblogs. The hypothesis is that microblog posts, although noisy, are not only indicative of important pieces of information in the news story, but also inherently “short and sweet” resulting from the artificial compression effect due to the length limit. Given a news article, we formulate the problem as two rank-then-extract tasks: (1) we find a set of indicative tweets and use them to assist the ranking of news sentences for extraction; (2) we extract top ranked tweets as a substitute of sentence extraction. Results based on our news-tweets pairing corpus indicate that the method significantly outperform some strong baselines for single-document summarization.

Original languageEnglish
Title of host publicationSocial Media Content Analysis
Subtitle of host publicationNatural Language Processing and Beyond
PublisherWorld Scientific Publishing Co. Pte Ltd
Pages277-296
Number of pages20
ISBN (Electronic)9789813223615
ISBN (Print)9789813223608
DOIs
Publication statusPublished - 1 Jan 2017

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Wei, Z., & Gao, W. (2017). Utilizing microblogs for automatic news highlights extraction. In Social Media Content Analysis: Natural Language Processing and Beyond (pp. 277-296). World Scientific Publishing Co. Pte Ltd. https://doi.org/10.1142/9789813223615_0019

Utilizing microblogs for automatic news highlights extraction. / Wei, Zhongyu; Gao, Wei.

Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd, 2017. p. 277-296.

Research output: Chapter in Book/Report/Conference proceedingChapter

Wei, Z & Gao, W 2017, Utilizing microblogs for automatic news highlights extraction. in Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd, pp. 277-296. https://doi.org/10.1142/9789813223615_0019
Wei Z, Gao W. Utilizing microblogs for automatic news highlights extraction. In Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd. 2017. p. 277-296 https://doi.org/10.1142/9789813223615_0019
Wei, Zhongyu ; Gao, Wei. / Utilizing microblogs for automatic news highlights extraction. Social Media Content Analysis: Natural Language Processing and Beyond. World Scientific Publishing Co. Pte Ltd, 2017. pp. 277-296
@inbook{db082d4eaa554a9a811ec86084cc64e2,
title = "Utilizing microblogs for automatic news highlights extraction",
abstract = "Story highlights form a succinct single-document summary consisting of 3-4 highlight sentences that reflect the gist of a news article. Automatically producing news highlights is very challenging. We propose a novel method to improve news highlights extraction by using microblogs. The hypothesis is that microblog posts, although noisy, are not only indicative of important pieces of information in the news story, but also inherently “short and sweet” resulting from the artificial compression effect due to the length limit. Given a news article, we formulate the problem as two rank-then-extract tasks: (1) we find a set of indicative tweets and use them to assist the ranking of news sentences for extraction; (2) we extract top ranked tweets as a substitute of sentence extraction. Results based on our news-tweets pairing corpus indicate that the method significantly outperform some strong baselines for single-document summarization.",
author = "Zhongyu Wei and Wei Gao",
year = "2017",
month = "1",
day = "1",
doi = "10.1142/9789813223615_0019",
language = "English",
isbn = "9789813223608",
pages = "277--296",
booktitle = "Social Media Content Analysis",
publisher = "World Scientific Publishing Co. Pte Ltd",
address = "Singapore",

}

TY - CHAP

T1 - Utilizing microblogs for automatic news highlights extraction

AU - Wei, Zhongyu

AU - Gao, Wei

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Story highlights form a succinct single-document summary consisting of 3-4 highlight sentences that reflect the gist of a news article. Automatically producing news highlights is very challenging. We propose a novel method to improve news highlights extraction by using microblogs. The hypothesis is that microblog posts, although noisy, are not only indicative of important pieces of information in the news story, but also inherently “short and sweet” resulting from the artificial compression effect due to the length limit. Given a news article, we formulate the problem as two rank-then-extract tasks: (1) we find a set of indicative tweets and use them to assist the ranking of news sentences for extraction; (2) we extract top ranked tweets as a substitute of sentence extraction. Results based on our news-tweets pairing corpus indicate that the method significantly outperform some strong baselines for single-document summarization.

AB - Story highlights form a succinct single-document summary consisting of 3-4 highlight sentences that reflect the gist of a news article. Automatically producing news highlights is very challenging. We propose a novel method to improve news highlights extraction by using microblogs. The hypothesis is that microblog posts, although noisy, are not only indicative of important pieces of information in the news story, but also inherently “short and sweet” resulting from the artificial compression effect due to the length limit. Given a news article, we formulate the problem as two rank-then-extract tasks: (1) we find a set of indicative tweets and use them to assist the ranking of news sentences for extraction; (2) we extract top ranked tweets as a substitute of sentence extraction. Results based on our news-tweets pairing corpus indicate that the method significantly outperform some strong baselines for single-document summarization.

UR - http://www.scopus.com/inward/record.url?scp=85041582690&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041582690&partnerID=8YFLogxK

U2 - 10.1142/9789813223615_0019

DO - 10.1142/9789813223615_0019

M3 - Chapter

SN - 9789813223608

SP - 277

EP - 296

BT - Social Media Content Analysis

PB - World Scientific Publishing Co. Pte Ltd

ER -