Extensive evaluation of efficient NLP-driven text classification

Roberto Basili, Alessandro Moschitti, Maria Teresa Pazienza

Research output: Contribution to journalArticle

2 Citations (Scopus)


Extensive experimental evidence is required to study the impact of text categorization approaches on real data and to assess the performance within operational scenarios. In this paper a wide set of profile-based classification models (a class of very efficient classifiers) sensitive to the syntactic information extracted from source texts is discussed. Several classifiers are tested, ranging from traditional approaches (e.g., variants of vector space, like SMART , or linear regression models) to original methods. All the experiments aim to evaluate some newly introduced feature weighting and inference models as well as to characterize the role of different linguistic information. The final purpose is thus to give an insight on the effective and efficient use of linguistic information for text categorization. The results suggest that an optimal exploitation of linguistic features can be obtained by a suitable selection among methods of feature weighting and inference. The empirical evidence collected in this paper over a wide range of corpora and languages is retained as a useful basis for the systematic design of operational statistical NLP-driven text classifiers.

Original languageEnglish
Pages (from-to)457-491
Number of pages35
JournalApplied Artificial Intelligence
Issue number6
Publication statusPublished - 1 Aug 2006
Externally publishedYes


ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this