Xproj: A framework for projected structural clustering of xml documents

Charu C. Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, Mohammed Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

84 Citations (Scopus)

Abstract

XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encode structural information about data records. However, this structural characteristic of data sets also makes it a challenging problem for a variety of data mining problems. One such problem is that of clustering, in which the structural aspects of the data result in a high implicit dimensionality of the data representation. As a result, it becomes more difficult to cluster the data in a meaningful way. In this paper, we propose an effective clustering algorithm for XML data which uses substructures of the documents in order to gain insights about the important underlying structures. We propose new ways of using multiple sub-structuralinformation in XML documents to evaluate the quality of intermediate cluster solutions, and guide the algorithms to a final solution which reflects the true structural behavior in individual partitions. We test the algorithm on a variety of real and synthetic data sets.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages46-55
Number of pages10
DOIs
Publication statusPublished - 14 Dec 2007
Externally publishedYes
EventKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - San Jose, CA, United States
Duration: 12 Aug 200715 Aug 2007

Other

OtherKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CitySan Jose, CA
Period12/8/0715/8/07

    Fingerprint

Keywords

  • Clustering
  • XML

ASJC Scopus subject areas

  • Information Systems

Cite this

Aggarwal, C. C., Ta, N., Wang, J., Feng, J., & Zaki, M. (2007). Xproj: A framework for projected structural clustering of xml documents. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 46-55) https://doi.org/10.1145/1281192.1281201