Multiclass text categorization for automated survey coding

Daniela Giorgetti, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on naïve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

Original languageEnglish
Title of host publicationProceedings of the ACM Symposium on Applied Computing
EditorsG. Lamont
Pages798-802
Number of pages5
Publication statusPublished - 2003
Externally publishedYes
EventProceedings of the 2003 ACM Symposium on Applied Computing - Melbourne, FL
Duration: 9 Mar 200312 Mar 2003

Other

OtherProceedings of the 2003 ACM Symposium on Applied Computing
CityMelbourne, FL
Period9/3/0312/3/03

Fingerprint

Support vector machines
Learning systems
Experiments

Keywords

  • Multiclass text categorization
  • Open-ended survey coding

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Giorgetti, D., & Sebastiani, F. (2003). Multiclass text categorization for automated survey coding. In G. Lamont (Ed.), Proceedings of the ACM Symposium on Applied Computing (pp. 798-802)

Multiclass text categorization for automated survey coding. / Giorgetti, Daniela; Sebastiani, Fabrizio.

Proceedings of the ACM Symposium on Applied Computing. ed. / G. Lamont. 2003. p. 798-802.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Giorgetti, D & Sebastiani, F 2003, Multiclass text categorization for automated survey coding. in G Lamont (ed.), Proceedings of the ACM Symposium on Applied Computing. pp. 798-802, Proceedings of the 2003 ACM Symposium on Applied Computing, Melbourne, FL, 9/3/03.
Giorgetti D, Sebastiani F. Multiclass text categorization for automated survey coding. In Lamont G, editor, Proceedings of the ACM Symposium on Applied Computing. 2003. p. 798-802
Giorgetti, Daniela ; Sebastiani, Fabrizio. / Multiclass text categorization for automated survey coding. Proceedings of the ACM Symposium on Applied Computing. editor / G. Lamont. 2003. pp. 798-802
@inproceedings{39ef5ae7520844b9ab2bb720957e1f5c,
title = "Multiclass text categorization for automated survey coding",
abstract = "Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on na{\"i}ve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.",
keywords = "Multiclass text categorization, Open-ended survey coding",
author = "Daniela Giorgetti and Fabrizio Sebastiani",
year = "2003",
language = "English",
pages = "798--802",
editor = "G. Lamont",
booktitle = "Proceedings of the ACM Symposium on Applied Computing",

}

TY - GEN

T1 - Multiclass text categorization for automated survey coding

AU - Giorgetti, Daniela

AU - Sebastiani, Fabrizio

PY - 2003

Y1 - 2003

N2 - Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on naïve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

AB - Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). We formulate the problem of automated survey coding as a text categorization problem, i.e. as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of pre-coded answers, and applying the resulting model to the classification of new answers. In this paper we experiment with two different learning techniques, one based on naïve Bayesian classification and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.

KW - Multiclass text categorization

KW - Open-ended survey coding

UR - http://www.scopus.com/inward/record.url?scp=0037661005&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037661005&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0037661005

SP - 798

EP - 802

BT - Proceedings of the ACM Symposium on Applied Computing

A2 - Lamont, G.

ER -