NLP-driven IR: Evaluating performances over a text classification task

Roberto Basili, Alessandro Moschitti, Maria Teresa Pazienza

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

Although several attempts have been made to introduce Natural Language Processing (NLP) techniques in Information Retrieval, most ones failed to prove their effectiveness in increasing performances. In this paper Text Classification (TC) has been taken as the IR task and the effect of linguistic capabilities of the underlying system have been studied. A novel model for TC, extending a well know statistical model (i.e. Rocchio's formula [Ittner et al., 1995]) and applied to linguistic features has been defined and experimented. The proposed model represents an effective feature selection methodology. All the experiments result in a significant improvement with respect to other purely statistical methods (e.g. [Yang, 1999]), thus stressing the relevance of the available linguistic information. Moreover, the derived classifier reachs the performance (about 85%) of the best known models (i.e. Support Vector Machines (SVM) and K-Nearest Neighbour (KNN)) characterized by an higher computational complexity for training and processing.

Original languageEnglish
Title of host publicationIJCAI International Joint Conference on Artificial Intelligence
Pages1286-1291
Number of pages6
Publication statusPublished - 2001
Externally publishedYes
Event17th International Joint Conference on Artificial Intelligence, IJCAI 2001 - Seattle, WA, United States
Duration: 4 Aug 200110 Aug 2001

Other

Other17th International Joint Conference on Artificial Intelligence, IJCAI 2001
CountryUnited States
CitySeattle, WA
Period4/8/0110/8/01

Fingerprint

Linguistics
Processing
Information retrieval
Support vector machines
Feature extraction
Computational complexity
Statistical methods
Classifiers
Experiments
Statistical Models

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Basili, R., Moschitti, A., & Pazienza, M. T. (2001). NLP-driven IR: Evaluating performances over a text classification task. In IJCAI International Joint Conference on Artificial Intelligence (pp. 1286-1291)

NLP-driven IR : Evaluating performances over a text classification task. / Basili, Roberto; Moschitti, Alessandro; Pazienza, Maria Teresa.

IJCAI International Joint Conference on Artificial Intelligence. 2001. p. 1286-1291.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Basili, R, Moschitti, A & Pazienza, MT 2001, NLP-driven IR: Evaluating performances over a text classification task. in IJCAI International Joint Conference on Artificial Intelligence. pp. 1286-1291, 17th International Joint Conference on Artificial Intelligence, IJCAI 2001, Seattle, WA, United States, 4/8/01.
Basili R, Moschitti A, Pazienza MT. NLP-driven IR: Evaluating performances over a text classification task. In IJCAI International Joint Conference on Artificial Intelligence. 2001. p. 1286-1291
Basili, Roberto ; Moschitti, Alessandro ; Pazienza, Maria Teresa. / NLP-driven IR : Evaluating performances over a text classification task. IJCAI International Joint Conference on Artificial Intelligence. 2001. pp. 1286-1291
@inproceedings{e4afe1d46666421cada8d4be116664d7,
title = "NLP-driven IR: Evaluating performances over a text classification task",
abstract = "Although several attempts have been made to introduce Natural Language Processing (NLP) techniques in Information Retrieval, most ones failed to prove their effectiveness in increasing performances. In this paper Text Classification (TC) has been taken as the IR task and the effect of linguistic capabilities of the underlying system have been studied. A novel model for TC, extending a well know statistical model (i.e. Rocchio's formula [Ittner et al., 1995]) and applied to linguistic features has been defined and experimented. The proposed model represents an effective feature selection methodology. All the experiments result in a significant improvement with respect to other purely statistical methods (e.g. [Yang, 1999]), thus stressing the relevance of the available linguistic information. Moreover, the derived classifier reachs the performance (about 85{\%}) of the best known models (i.e. Support Vector Machines (SVM) and K-Nearest Neighbour (KNN)) characterized by an higher computational complexity for training and processing.",
author = "Roberto Basili and Alessandro Moschitti and Pazienza, {Maria Teresa}",
year = "2001",
language = "English",
pages = "1286--1291",
booktitle = "IJCAI International Joint Conference on Artificial Intelligence",

}

TY - GEN

T1 - NLP-driven IR

T2 - Evaluating performances over a text classification task

AU - Basili, Roberto

AU - Moschitti, Alessandro

AU - Pazienza, Maria Teresa

PY - 2001

Y1 - 2001

N2 - Although several attempts have been made to introduce Natural Language Processing (NLP) techniques in Information Retrieval, most ones failed to prove their effectiveness in increasing performances. In this paper Text Classification (TC) has been taken as the IR task and the effect of linguistic capabilities of the underlying system have been studied. A novel model for TC, extending a well know statistical model (i.e. Rocchio's formula [Ittner et al., 1995]) and applied to linguistic features has been defined and experimented. The proposed model represents an effective feature selection methodology. All the experiments result in a significant improvement with respect to other purely statistical methods (e.g. [Yang, 1999]), thus stressing the relevance of the available linguistic information. Moreover, the derived classifier reachs the performance (about 85%) of the best known models (i.e. Support Vector Machines (SVM) and K-Nearest Neighbour (KNN)) characterized by an higher computational complexity for training and processing.

AB - Although several attempts have been made to introduce Natural Language Processing (NLP) techniques in Information Retrieval, most ones failed to prove their effectiveness in increasing performances. In this paper Text Classification (TC) has been taken as the IR task and the effect of linguistic capabilities of the underlying system have been studied. A novel model for TC, extending a well know statistical model (i.e. Rocchio's formula [Ittner et al., 1995]) and applied to linguistic features has been defined and experimented. The proposed model represents an effective feature selection methodology. All the experiments result in a significant improvement with respect to other purely statistical methods (e.g. [Yang, 1999]), thus stressing the relevance of the available linguistic information. Moreover, the derived classifier reachs the performance (about 85%) of the best known models (i.e. Support Vector Machines (SVM) and K-Nearest Neighbour (KNN)) characterized by an higher computational complexity for training and processing.

UR - http://www.scopus.com/inward/record.url?scp=84880908496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880908496&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84880908496

SP - 1286

EP - 1291

BT - IJCAI International Joint Conference on Artificial Intelligence

ER -