CODRA

A novel discriminative framework for rhetorical analysis

Shafiq Rayhan Joty, Giuseppe Carenini, Raymond T. Ng

Research output: Contribution to journalArticle

32 Citations (Scopus)

Abstract

Clauses and sentences rarely stand on their own in an actual discourse; rather, the relationship between them carries important information that allows the discourse to express a meaning as a whole beyond the sum of its individual parts. Rhetorical analysis seeks to uncover this coherence structure. In this article, we present CODRA— a COmplete probabilistic Discriminative framework for performing Rhetorical Analysis in accordance with Rhetorical Structure Theory, which posits a tree representation of a discourse. CODRA comprises a discourse segmenter and a discourse parser. First, the discourse segmenter, which is based on a binary classifier, identifies the elementary discourse units in a given text. Then the discourse parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intra-sentential parsing and the other for multi-sentential parsing. We present two approaches to combine these two stages of parsing effectively. By conducting a series of empirical evaluations over two different data sets, we demonstrate that CODRA significantly outperforms the state-of-the-art, often by a wide margin. We also show that a reranking of the k-best parse hypotheses generated by CODRA can potentially improve the accuracy even further.

Original languageEnglish
Pages (from-to)385-435
Number of pages51
JournalComputational Linguistics
Volume41
Issue number3
DOIs
Publication statusPublished - 10 Sep 2015

Fingerprint

Trees (mathematics)
Classifiers
discourse
Rhetorical Analysis
Discourse
Parsing

ASJC Scopus subject areas

  • Computer Science Applications
  • Artificial Intelligence
  • Linguistics and Language
  • Language and Linguistics

Cite this

CODRA : A novel discriminative framework for rhetorical analysis. / Rayhan Joty, Shafiq; Carenini, Giuseppe; Ng, Raymond T.

In: Computational Linguistics, Vol. 41, No. 3, 10.09.2015, p. 385-435.

Research output: Contribution to journalArticle

Rayhan Joty, Shafiq ; Carenini, Giuseppe ; Ng, Raymond T. / CODRA : A novel discriminative framework for rhetorical analysis. In: Computational Linguistics. 2015 ; Vol. 41, No. 3. pp. 385-435.
@article{e9a2cf6734544f5cbff50006c90a9d0c,
title = "CODRA: A novel discriminative framework for rhetorical analysis",
abstract = "Clauses and sentences rarely stand on their own in an actual discourse; rather, the relationship between them carries important information that allows the discourse to express a meaning as a whole beyond the sum of its individual parts. Rhetorical analysis seeks to uncover this coherence structure. In this article, we present CODRA— a COmplete probabilistic Discriminative framework for performing Rhetorical Analysis in accordance with Rhetorical Structure Theory, which posits a tree representation of a discourse. CODRA comprises a discourse segmenter and a discourse parser. First, the discourse segmenter, which is based on a binary classifier, identifies the elementary discourse units in a given text. Then the discourse parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intra-sentential parsing and the other for multi-sentential parsing. We present two approaches to combine these two stages of parsing effectively. By conducting a series of empirical evaluations over two different data sets, we demonstrate that CODRA significantly outperforms the state-of-the-art, often by a wide margin. We also show that a reranking of the k-best parse hypotheses generated by CODRA can potentially improve the accuracy even further.",
author = "{Rayhan Joty}, Shafiq and Giuseppe Carenini and Ng, {Raymond T.}",
year = "2015",
month = "9",
day = "10",
doi = "10.1162/COLIa00226",
language = "English",
volume = "41",
pages = "385--435",
journal = "Computational Linguistics",
issn = "0891-2017",
publisher = "MIT Press Journals",
number = "3",

}

TY - JOUR

T1 - CODRA

T2 - A novel discriminative framework for rhetorical analysis

AU - Rayhan Joty, Shafiq

AU - Carenini, Giuseppe

AU - Ng, Raymond T.

PY - 2015/9/10

Y1 - 2015/9/10

N2 - Clauses and sentences rarely stand on their own in an actual discourse; rather, the relationship between them carries important information that allows the discourse to express a meaning as a whole beyond the sum of its individual parts. Rhetorical analysis seeks to uncover this coherence structure. In this article, we present CODRA— a COmplete probabilistic Discriminative framework for performing Rhetorical Analysis in accordance with Rhetorical Structure Theory, which posits a tree representation of a discourse. CODRA comprises a discourse segmenter and a discourse parser. First, the discourse segmenter, which is based on a binary classifier, identifies the elementary discourse units in a given text. Then the discourse parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intra-sentential parsing and the other for multi-sentential parsing. We present two approaches to combine these two stages of parsing effectively. By conducting a series of empirical evaluations over two different data sets, we demonstrate that CODRA significantly outperforms the state-of-the-art, often by a wide margin. We also show that a reranking of the k-best parse hypotheses generated by CODRA can potentially improve the accuracy even further.

AB - Clauses and sentences rarely stand on their own in an actual discourse; rather, the relationship between them carries important information that allows the discourse to express a meaning as a whole beyond the sum of its individual parts. Rhetorical analysis seeks to uncover this coherence structure. In this article, we present CODRA— a COmplete probabilistic Discriminative framework for performing Rhetorical Analysis in accordance with Rhetorical Structure Theory, which posits a tree representation of a discourse. CODRA comprises a discourse segmenter and a discourse parser. First, the discourse segmenter, which is based on a binary classifier, identifies the elementary discourse units in a given text. Then the discourse parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intra-sentential parsing and the other for multi-sentential parsing. We present two approaches to combine these two stages of parsing effectively. By conducting a series of empirical evaluations over two different data sets, we demonstrate that CODRA significantly outperforms the state-of-the-art, often by a wide margin. We also show that a reranking of the k-best parse hypotheses generated by CODRA can potentially improve the accuracy even further.

UR - http://www.scopus.com/inward/record.url?scp=84941276880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941276880&partnerID=8YFLogxK

U2 - 10.1162/COLIa00226

DO - 10.1162/COLIa00226

M3 - Article

VL - 41

SP - 385

EP - 435

JO - Computational Linguistics

JF - Computational Linguistics

SN - 0891-2017

IS - 3

ER -