XRules: An effective structural classifier for XML data

Mohammed J. Zaki, Charu C. Aggarwal

Research output: Contribution to conferencePaper

141 Citations (Scopus)

Abstract

XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

Original languageEnglish
Pages316-325
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2003
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: 24 Aug 200327 Aug 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period24/8/0327/8/03

Keywords

  • Classification
  • Tree mining
  • XML/Semi-structured data

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'XRules: An effective structural classifier for XML data'. Together they form a unique fingerprint.

  • Cite this

    Zaki, M. J., & Aggarwal, C. C. (2003). XRules: An effective structural classifier for XML data. 316-325. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States. https://doi.org/10.1145/956750.956787