Blocking reduction strategies in hierarchical text classification

Aixin Sun, Ee Peng Lim, Wee Keong Ng, Jaideep Srivastava

Research output: Contribution to journalArticle

38 Citations (Scopus)

Abstract

One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. In this paper, we propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, Threshold Reduction, Restricted Voting, and Extended Multiplicative. Our experiments using Support Vector Machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance.

Original languageEnglish
Pages (from-to)1305-1308
Number of pages4
JournalIEEE Transactions on Knowledge and Data Engineering
Volume16
Issue number10
DOIs
Publication statusPublished - Oct 2004
Externally publishedYes

Fingerprint

Classifiers
Support vector machines
Experiments

Keywords

  • Classification
  • Data mining
  • Text mining

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Blocking reduction strategies in hierarchical text classification. / Sun, Aixin; Lim, Ee Peng; Ng, Wee Keong; Srivastava, Jaideep.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 10, 10.2004, p. 1305-1308.

Research output: Contribution to journalArticle

@article{38b2c1299d1d4ab297d4153eb19c66af,
title = "Blocking reduction strategies in hierarchical text classification",
abstract = "One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. In this paper, we propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, Threshold Reduction, Restricted Voting, and Extended Multiplicative. Our experiments using Support Vector Machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance.",
keywords = "Classification, Data mining, Text mining",
author = "Aixin Sun and Lim, {Ee Peng} and Ng, {Wee Keong} and Jaideep Srivastava",
year = "2004",
month = "10",
doi = "10.1109/TKDE.2004.50",
language = "English",
volume = "16",
pages = "1305--1308",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "10",

}

TY - JOUR

T1 - Blocking reduction strategies in hierarchical text classification

AU - Sun, Aixin

AU - Lim, Ee Peng

AU - Ng, Wee Keong

AU - Srivastava, Jaideep

PY - 2004/10

Y1 - 2004/10

N2 - One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. In this paper, we propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, Threshold Reduction, Restricted Voting, and Extended Multiplicative. Our experiments using Support Vector Machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance.

AB - One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. In this paper, we propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, Threshold Reduction, Restricted Voting, and Extended Multiplicative. Our experiments using Support Vector Machine (SVM) classifiers on the Reuters collection have shown that they all could reduce blocking and improve the classification accuracy. Our experiments have also shown that the Restricted Voting method delivered the best performance.

KW - Classification

KW - Data mining

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=13844255022&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=13844255022&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2004.50

DO - 10.1109/TKDE.2004.50

M3 - Article

AN - SCOPUS:13844255022

VL - 16

SP - 1305

EP - 1308

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 10

ER -