Multi-document abstractive summarization using ILP based multi-sentence compression

Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

40 Citations (Scopus)

Abstract

Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the sentences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.

Original languageEnglish
Title of host publicationIJCAI International Joint Conference on Artificial Intelligence
PublisherInternational Joint Conferences on Artificial Intelligence
Pages1208-1214
Number of pages7
Volume2015-January
ISBN (Print)9781577357384
Publication statusPublished - 2015
Event24th International Joint Conference on Artificial Intelligence, IJCAI 2015 - Buenos Aires, Argentina
Duration: 25 Jul 201531 Jul 2015

Other

Other24th International Joint Conference on Artificial Intelligence, IJCAI 2015
CountryArgentina
CityBuenos Aires
Period25/7/1531/7/15

Fingerprint

Linear programming
Linguistics

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Banerjee, S., Mitra, P., & Sugiyama, K. (2015). Multi-document abstractive summarization using ILP based multi-sentence compression. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2015-January, pp. 1208-1214). International Joint Conferences on Artificial Intelligence.

Multi-document abstractive summarization using ILP based multi-sentence compression. / Banerjee, Siddhartha; Mitra, Prasenjit; Sugiyama, Kazunari.

IJCAI International Joint Conference on Artificial Intelligence. Vol. 2015-January International Joint Conferences on Artificial Intelligence, 2015. p. 1208-1214.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Banerjee, S, Mitra, P & Sugiyama, K 2015, Multi-document abstractive summarization using ILP based multi-sentence compression. in IJCAI International Joint Conference on Artificial Intelligence. vol. 2015-January, International Joint Conferences on Artificial Intelligence, pp. 1208-1214, 24th International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, 25/7/15.
Banerjee S, Mitra P, Sugiyama K. Multi-document abstractive summarization using ILP based multi-sentence compression. In IJCAI International Joint Conference on Artificial Intelligence. Vol. 2015-January. International Joint Conferences on Artificial Intelligence. 2015. p. 1208-1214
Banerjee, Siddhartha ; Mitra, Prasenjit ; Sugiyama, Kazunari. / Multi-document abstractive summarization using ILP based multi-sentence compression. IJCAI International Joint Conference on Artificial Intelligence. Vol. 2015-January International Joint Conferences on Artificial Intelligence, 2015. pp. 1208-1214
@inproceedings{5fa54dc36ea14d3c9a9bfcb77e1605f9,
title = "Multi-document abstractive summarization using ILP based multi-sentence compression",
abstract = "Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the sentences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.",
author = "Siddhartha Banerjee and Prasenjit Mitra and Kazunari Sugiyama",
year = "2015",
language = "English",
isbn = "9781577357384",
volume = "2015-January",
pages = "1208--1214",
booktitle = "IJCAI International Joint Conference on Artificial Intelligence",
publisher = "International Joint Conferences on Artificial Intelligence",

}

TY - GEN

T1 - Multi-document abstractive summarization using ILP based multi-sentence compression

AU - Banerjee, Siddhartha

AU - Mitra, Prasenjit

AU - Sugiyama, Kazunari

PY - 2015

Y1 - 2015

N2 - Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the sentences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.

AB - Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the sentences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.

UR - http://www.scopus.com/inward/record.url?scp=84949778956&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949778956&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781577357384

VL - 2015-January

SP - 1208

EP - 1214

BT - IJCAI International Joint Conference on Artificial Intelligence

PB - International Joint Conferences on Artificial Intelligence

ER -