Carpenter

Finding closed patterns in long biological datasets

Feng Pan, Gao Cong, Anthony K H Tung, Jiong Yang, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

131 Citations (Scopus)

Abstract

The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000-100,000 columns but only 100-1000 rows.Such datasets pose a great challenge for existing (closed) frequent pattern discovery algorithms, since they have an exponential dependence on the average row length. In this paper, we describe a new algorithm called CARPENTER that is specially designed to handle datasets having a large number of attributes and relatively small number of rows. Several experiments on real bioinformatics datasets show that CARPENTER is orders of magnitude better than previous closed pattern mining algorithms like CLOSET and CHARM.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages637-642
Number of pages6
DOIs
Publication statusPublished - 1 Dec 2003
Externally publishedYes
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: 24 Aug 200327 Aug 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period24/8/0327/8/03

Fingerprint

Bioinformatics
Gene expression
Experiments

Keywords

  • Closed pattern
  • Frequent pattern
  • Row enumeration

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Pan, F., Cong, G., Tung, A. K. H., Yang, J., & Zaki, M. J. (2003). Carpenter: Finding closed patterns in long biological datasets. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 637-642) https://doi.org/10.1145/956750.956832

Carpenter : Finding closed patterns in long biological datasets. / Pan, Feng; Cong, Gao; Tung, Anthony K H; Yang, Jiong; Zaki, Mohammed J.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 637-642.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pan, F, Cong, G, Tung, AKH, Yang, J & Zaki, MJ 2003, Carpenter: Finding closed patterns in long biological datasets. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 637-642, 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States, 24/8/03. https://doi.org/10.1145/956750.956832
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ. Carpenter: Finding closed patterns in long biological datasets. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 637-642 https://doi.org/10.1145/956750.956832
Pan, Feng ; Cong, Gao ; Tung, Anthony K H ; Yang, Jiong ; Zaki, Mohammed J. / Carpenter : Finding closed patterns in long biological datasets. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. pp. 637-642
@inproceedings{9da7cdac0ad54a6ab33040b570ef37d8,
title = "Carpenter: Finding closed patterns in long biological datasets",
abstract = "The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000-100,000 columns but only 100-1000 rows.Such datasets pose a great challenge for existing (closed) frequent pattern discovery algorithms, since they have an exponential dependence on the average row length. In this paper, we describe a new algorithm called CARPENTER that is specially designed to handle datasets having a large number of attributes and relatively small number of rows. Several experiments on real bioinformatics datasets show that CARPENTER is orders of magnitude better than previous closed pattern mining algorithms like CLOSET and CHARM.",
keywords = "Closed pattern, Frequent pattern, Row enumeration",
author = "Feng Pan and Gao Cong and Tung, {Anthony K H} and Jiong Yang and Zaki, {Mohammed J.}",
year = "2003",
month = "12",
day = "1",
doi = "10.1145/956750.956832",
language = "English",
pages = "637--642",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Carpenter

T2 - Finding closed patterns in long biological datasets

AU - Pan, Feng

AU - Cong, Gao

AU - Tung, Anthony K H

AU - Yang, Jiong

AU - Zaki, Mohammed J.

PY - 2003/12/1

Y1 - 2003/12/1

N2 - The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000-100,000 columns but only 100-1000 rows.Such datasets pose a great challenge for existing (closed) frequent pattern discovery algorithms, since they have an exponential dependence on the average row length. In this paper, we describe a new algorithm called CARPENTER that is specially designed to handle datasets having a large number of attributes and relatively small number of rows. Several experiments on real bioinformatics datasets show that CARPENTER is orders of magnitude better than previous closed pattern mining algorithms like CLOSET and CHARM.

AB - The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000-100,000 columns but only 100-1000 rows.Such datasets pose a great challenge for existing (closed) frequent pattern discovery algorithms, since they have an exponential dependence on the average row length. In this paper, we describe a new algorithm called CARPENTER that is specially designed to handle datasets having a large number of attributes and relatively small number of rows. Several experiments on real bioinformatics datasets show that CARPENTER is orders of magnitude better than previous closed pattern mining algorithms like CLOSET and CHARM.

KW - Closed pattern

KW - Frequent pattern

KW - Row enumeration

UR - http://www.scopus.com/inward/record.url?scp=77952367051&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952367051&partnerID=8YFLogxK

U2 - 10.1145/956750.956832

DO - 10.1145/956750.956832

M3 - Conference contribution

SP - 637

EP - 642

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -