Extensive evaluation of efficient NLP-driven text classification

Roberto Basili, Alessandro Moschitti, Maria Teresa Pazienza

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Extensive experimental evidence is required to study the impact of text categorization approaches on real data and to assess the performance within operational scenarios. In this paper a wide set of profile-based classification models (a class of very efficient classifiers) sensitive to the syntactic information extracted from source texts is discussed. Several classifiers are tested, ranging from traditional approaches (e.g., variants of vector space, like SMART , or linear regression models) to original methods. All the experiments aim to evaluate some newly introduced feature weighting and inference models as well as to characterize the role of different linguistic information. The final purpose is thus to give an insight on the effective and efficient use of linguistic information for text categorization. The results suggest that an optimal exploitation of linguistic features can be obtained by a suitable selection among methods of feature weighting and inference. The empirical evidence collected in this paper over a wide range of corpora and languages is retained as a useful basis for the systematic design of operational statistical NLP-driven text classifiers.

Original languageEnglish
Pages (from-to)457-491
Number of pages35
JournalApplied Artificial Intelligence
Volume20
Issue number6
DOIs
Publication statusPublished - 1 Aug 2006
Externally publishedYes

Fingerprint

Linguistics
Classifiers
Syntactics
Vector spaces
Linear regression
Experiments

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Extensive evaluation of efficient NLP-driven text classification. / Basili, Roberto; Moschitti, Alessandro; Pazienza, Maria Teresa.

In: Applied Artificial Intelligence, Vol. 20, No. 6, 01.08.2006, p. 457-491.

Research output: Contribution to journalArticle

Basili, Roberto ; Moschitti, Alessandro ; Pazienza, Maria Teresa. / Extensive evaluation of efficient NLP-driven text classification. In: Applied Artificial Intelligence. 2006 ; Vol. 20, No. 6. pp. 457-491.
@article{33aee7b14b0c48999f29fd7547a44e99,
title = "Extensive evaluation of efficient NLP-driven text classification",
abstract = "Extensive experimental evidence is required to study the impact of text categorization approaches on real data and to assess the performance within operational scenarios. In this paper a wide set of profile-based classification models (a class of very efficient classifiers) sensitive to the syntactic information extracted from source texts is discussed. Several classifiers are tested, ranging from traditional approaches (e.g., variants of vector space, like SMART , or linear regression models) to original methods. All the experiments aim to evaluate some newly introduced feature weighting and inference models as well as to characterize the role of different linguistic information. The final purpose is thus to give an insight on the effective and efficient use of linguistic information for text categorization. The results suggest that an optimal exploitation of linguistic features can be obtained by a suitable selection among methods of feature weighting and inference. The empirical evidence collected in this paper over a wide range of corpora and languages is retained as a useful basis for the systematic design of operational statistical NLP-driven text classifiers.",
author = "Roberto Basili and Alessandro Moschitti and Pazienza, {Maria Teresa}",
year = "2006",
month = "8",
day = "1",
doi = "10.1080/08839510600753725",
language = "English",
volume = "20",
pages = "457--491",
journal = "Applied Artificial Intelligence",
issn = "0883-9514",
publisher = "Taylor and Francis Ltd.",
number = "6",

}

TY - JOUR

T1 - Extensive evaluation of efficient NLP-driven text classification

AU - Basili, Roberto

AU - Moschitti, Alessandro

AU - Pazienza, Maria Teresa

PY - 2006/8/1

Y1 - 2006/8/1

N2 - Extensive experimental evidence is required to study the impact of text categorization approaches on real data and to assess the performance within operational scenarios. In this paper a wide set of profile-based classification models (a class of very efficient classifiers) sensitive to the syntactic information extracted from source texts is discussed. Several classifiers are tested, ranging from traditional approaches (e.g., variants of vector space, like SMART , or linear regression models) to original methods. All the experiments aim to evaluate some newly introduced feature weighting and inference models as well as to characterize the role of different linguistic information. The final purpose is thus to give an insight on the effective and efficient use of linguistic information for text categorization. The results suggest that an optimal exploitation of linguistic features can be obtained by a suitable selection among methods of feature weighting and inference. The empirical evidence collected in this paper over a wide range of corpora and languages is retained as a useful basis for the systematic design of operational statistical NLP-driven text classifiers.

AB - Extensive experimental evidence is required to study the impact of text categorization approaches on real data and to assess the performance within operational scenarios. In this paper a wide set of profile-based classification models (a class of very efficient classifiers) sensitive to the syntactic information extracted from source texts is discussed. Several classifiers are tested, ranging from traditional approaches (e.g., variants of vector space, like SMART , or linear regression models) to original methods. All the experiments aim to evaluate some newly introduced feature weighting and inference models as well as to characterize the role of different linguistic information. The final purpose is thus to give an insight on the effective and efficient use of linguistic information for text categorization. The results suggest that an optimal exploitation of linguistic features can be obtained by a suitable selection among methods of feature weighting and inference. The empirical evidence collected in this paper over a wide range of corpora and languages is retained as a useful basis for the systematic design of operational statistical NLP-driven text classifiers.

UR - http://www.scopus.com/inward/record.url?scp=33745615646&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745615646&partnerID=8YFLogxK

U2 - 10.1080/08839510600753725

DO - 10.1080/08839510600753725

M3 - Article

AN - SCOPUS:33745615646

VL - 20

SP - 457

EP - 491

JO - Applied Artificial Intelligence

JF - Applied Artificial Intelligence

SN - 0883-9514

IS - 6

ER -