Learning to recognize ancillary information for automatic paraphrase identification

Simone Filice, Alessandro Moschitti

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Previous work on Automatic Paraphrase Identification (PI) is mainly based on modeling text similarity between two sentences. In contrast, we study methods for automatically detecting whether a text fragment only appearing in a sentence of the evaluated sentence pair is important or ancillary information with respect to the paraphrase identification task. Engineering features for this new task is rather difficult, thus, we approach the problem by representing text with syntactic structures and applying tree kernels on them. The results show that the accuracy of our automatic Ancillary Text Classifier (ATC) is promising, i.e., 68.6%, and its output can be used to improve the state of the art in PI.

Original languageEnglish
Title of host publication2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1109-1114
Number of pages6
ISBN (Electronic)9781941643914
Publication statusPublished - 2016
Event15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - San Diego, United States
Duration: 12 Jun 201617 Jun 2016

Other

Other15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016
CountryUnited States
CitySan Diego
Period12/6/1617/6/16

Fingerprint

Syntactics
Classifiers
learning
engineering
Paraphrase

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Cite this

Filice, S., & Moschitti, A. (2016). Learning to recognize ancillary information for automatic paraphrase identification. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 1109-1114). Association for Computational Linguistics (ACL).

Learning to recognize ancillary information for automatic paraphrase identification. / Filice, Simone; Moschitti, Alessandro.

2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. Association for Computational Linguistics (ACL), 2016. p. 1109-1114.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Filice, S & Moschitti, A 2016, Learning to recognize ancillary information for automatic paraphrase identification. in 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. Association for Computational Linguistics (ACL), pp. 1109-1114, 15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016, San Diego, United States, 12/6/16.
Filice S, Moschitti A. Learning to recognize ancillary information for automatic paraphrase identification. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. Association for Computational Linguistics (ACL). 2016. p. 1109-1114
Filice, Simone ; Moschitti, Alessandro. / Learning to recognize ancillary information for automatic paraphrase identification. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. Association for Computational Linguistics (ACL), 2016. pp. 1109-1114
@inproceedings{987ea52b32e64dfe83c065ed195aa2fc,
title = "Learning to recognize ancillary information for automatic paraphrase identification",
abstract = "Previous work on Automatic Paraphrase Identification (PI) is mainly based on modeling text similarity between two sentences. In contrast, we study methods for automatically detecting whether a text fragment only appearing in a sentence of the evaluated sentence pair is important or ancillary information with respect to the paraphrase identification task. Engineering features for this new task is rather difficult, thus, we approach the problem by representing text with syntactic structures and applying tree kernels on them. The results show that the accuracy of our automatic Ancillary Text Classifier (ATC) is promising, i.e., 68.6{\%}, and its output can be used to improve the state of the art in PI.",
author = "Simone Filice and Alessandro Moschitti",
year = "2016",
language = "English",
pages = "1109--1114",
booktitle = "2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Learning to recognize ancillary information for automatic paraphrase identification

AU - Filice, Simone

AU - Moschitti, Alessandro

PY - 2016

Y1 - 2016

N2 - Previous work on Automatic Paraphrase Identification (PI) is mainly based on modeling text similarity between two sentences. In contrast, we study methods for automatically detecting whether a text fragment only appearing in a sentence of the evaluated sentence pair is important or ancillary information with respect to the paraphrase identification task. Engineering features for this new task is rather difficult, thus, we approach the problem by representing text with syntactic structures and applying tree kernels on them. The results show that the accuracy of our automatic Ancillary Text Classifier (ATC) is promising, i.e., 68.6%, and its output can be used to improve the state of the art in PI.

AB - Previous work on Automatic Paraphrase Identification (PI) is mainly based on modeling text similarity between two sentences. In contrast, we study methods for automatically detecting whether a text fragment only appearing in a sentence of the evaluated sentence pair is important or ancillary information with respect to the paraphrase identification task. Engineering features for this new task is rather difficult, thus, we approach the problem by representing text with syntactic structures and applying tree kernels on them. The results show that the accuracy of our automatic Ancillary Text Classifier (ATC) is promising, i.e., 68.6%, and its output can be used to improve the state of the art in PI.

UR - http://www.scopus.com/inward/record.url?scp=84994172640&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994172640&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84994172640

SP - 1109

EP - 1114

BT - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference

PB - Association for Computational Linguistics (ACL)

ER -