Efficiently mining frequent trees in a forest

Mohammed J. Zaki

Research output: Contribution to conferencePaper

370 Citations (Scopus)

Abstract

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.

Original languageEnglish
Pages71-80
Number of pages10
Publication statusPublished - 1 Dec 2002
EventKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Edmonton, Alta, Canada
Duration: 23 Jul 200226 Jul 2002

Other

OtherKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryCanada
CityEdmonton, Alta
Period23/7/0226/7/02

    Fingerprint

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zaki, M. J. (2002). Efficiently mining frequent trees in a forest. 71-80. Paper presented at KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alta, Canada.