XRules

An effective structural classifier for XML data

Mohammed J. Zaki, Charu C. Aggarwal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

140 Citations (Scopus)

Abstract

XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages316-325
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2003
Externally publishedYes
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: 24 Aug 200327 Aug 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period24/8/0327/8/03

Fingerprint

XML
Classifiers
Data mining
Costs

Keywords

  • Classification
  • Tree mining
  • XML/Semi-structured data

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zaki, M. J., & Aggarwal, C. C. (2003). XRules: An effective structural classifier for XML data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 316-325) https://doi.org/10.1145/956750.956787

XRules : An effective structural classifier for XML data. / Zaki, Mohammed J.; Aggarwal, Charu C.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 316-325.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaki, MJ & Aggarwal, CC 2003, XRules: An effective structural classifier for XML data. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 316-325, 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States, 24/8/03. https://doi.org/10.1145/956750.956787
Zaki MJ, Aggarwal CC. XRules: An effective structural classifier for XML data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 316-325 https://doi.org/10.1145/956750.956787
Zaki, Mohammed J. ; Aggarwal, Charu C. / XRules : An effective structural classifier for XML data. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. pp. 316-325
@inproceedings{14bfdb4cfc3f4e9696787b26439ac7f8,
title = "XRules: An effective structural classifier for XML data",
abstract = "XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.",
keywords = "Classification, Tree mining, XML/Semi-structured data",
author = "Zaki, {Mohammed J.} and Aggarwal, {Charu C.}",
year = "2003",
month = "12",
day = "1",
doi = "10.1145/956750.956787",
language = "English",
pages = "316--325",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - XRules

T2 - An effective structural classifier for XML data

AU - Zaki, Mohammed J.

AU - Aggarwal, Charu C.

PY - 2003/12/1

Y1 - 2003/12/1

N2 - XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

AB - XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

KW - Classification

KW - Tree mining

KW - XML/Semi-structured data

UR - http://www.scopus.com/inward/record.url?scp=77952397753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952397753&partnerID=8YFLogxK

U2 - 10.1145/956750.956787

DO - 10.1145/956750.956787

M3 - Conference contribution

SP - 316

EP - 325

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -