Efficiently mining frequent trees in a forest

Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

363 Citations (Scopus)

Abstract

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsD. Hand, D. Keim, R. Ng
Pages71-80
Number of pages10
Publication statusPublished - 1 Dec 2002
Externally publishedYes
EventKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Edmonton, Alta, Canada
Duration: 23 Jul 200226 Jul 2002

Other

OtherKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryCanada
CityEdmonton, Alta
Period23/7/0226/7/02

Fingerprint

Pattern matching
Bioinformatics
Data structures
Scalability
Experiments

ASJC Scopus subject areas

  • Information Systems

Cite this

Zaki, M. J. (2002). Efficiently mining frequent trees in a forest. In D. Hand, D. Keim, & R. Ng (Eds.), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 71-80)

Efficiently mining frequent trees in a forest. / Zaki, Mohammed J.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / D. Hand; D. Keim; R. Ng. 2002. p. 71-80.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaki, MJ 2002, Efficiently mining frequent trees in a forest. in D Hand, D Keim & R Ng (eds), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 71-80, KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alta, Canada, 23/7/02.
Zaki MJ. Efficiently mining frequent trees in a forest. In Hand D, Keim D, Ng R, editors, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2002. p. 71-80
Zaki, Mohammed J. / Efficiently mining frequent trees in a forest. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / D. Hand ; D. Keim ; R. Ng. 2002. pp. 71-80
@inproceedings{af34bc43e51848b0b16b275c7d6bb046,
title = "Efficiently mining frequent trees in a forest",
abstract = "Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.",
author = "Zaki, {Mohammed J.}",
year = "2002",
month = "12",
day = "1",
language = "English",
pages = "71--80",
editor = "D. Hand and D. Keim and R. Ng",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Efficiently mining frequent trees in a forest

AU - Zaki, Mohammed J.

PY - 2002/12/1

Y1 - 2002/12/1

N2 - Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.

AB - Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.

UR - http://www.scopus.com/inward/record.url?scp=0242709382&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0242709382&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0242709382

SP - 71

EP - 80

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Hand, D.

A2 - Keim, D.

A2 - Ng, R.

ER -