XRules

An effective algorithm for structural classification of XML data

Mohammed J. Zaki, Charu C. Aggarwal

Research output: Contribution to journalArticle

47 Citations (Scopus)

Abstract

XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

Original languageEnglish
Pages (from-to)137-170
Number of pages34
JournalMachine Learning
Volume62
Issue number1-2 SPEC. ISS.
DOIs
Publication statusPublished - 1 Feb 2006
Externally publishedYes

Fingerprint

XML
Data mining
Classifiers
Costs

Keywords

  • Classification
  • Rule induction
  • Tree mining
  • XML/Semi-structured data

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence

Cite this

XRules : An effective algorithm for structural classification of XML data. / Zaki, Mohammed J.; Aggarwal, Charu C.

In: Machine Learning, Vol. 62, No. 1-2 SPEC. ISS., 01.02.2006, p. 137-170.

Research output: Contribution to journalArticle

Zaki, Mohammed J. ; Aggarwal, Charu C. / XRules : An effective algorithm for structural classification of XML data. In: Machine Learning. 2006 ; Vol. 62, No. 1-2 SPEC. ISS. pp. 137-170.
@article{d86a5958fbe74000897ffca4331ac979,
title = "XRules: An effective algorithm for structural classification of XML data",
abstract = "XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.",
keywords = "Classification, Rule induction, Tree mining, XML/Semi-structured data",
author = "Zaki, {Mohammed J.} and Aggarwal, {Charu C.}",
year = "2006",
month = "2",
day = "1",
doi = "10.1007/s10994-006-5832-2",
language = "English",
volume = "62",
pages = "137--170",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "1-2 SPEC. ISS.",

}

TY - JOUR

T1 - XRules

T2 - An effective algorithm for structural classification of XML data

AU - Zaki, Mohammed J.

AU - Aggarwal, Charu C.

PY - 2006/2/1

Y1 - 2006/2/1

N2 - XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

AB - XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

KW - Classification

KW - Rule induction

KW - Tree mining

KW - XML/Semi-structured data

UR - http://www.scopus.com/inward/record.url?scp=32044441059&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32044441059&partnerID=8YFLogxK

U2 - 10.1007/s10994-006-5832-2

DO - 10.1007/s10994-006-5832-2

M3 - Article

VL - 62

SP - 137

EP - 170

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1-2 SPEC. ISS.

ER -