Geometrically inspired itemset mining

Florian Verhein, Sanjay Chawla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In our geometric view, an itemset is a vector (Itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (o) and measures (f and F). For Frequent Itemset Mining (FIM), g and F are identity transformations, o is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FPGrowth on realistic datasets above a small support threshold (0.29% and 1.2% in our experiments)1.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Data Mining, ICDM
Pages655-666
Number of pages12
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event6th International Conference on Data Mining, ICDM 2006 - Hong Kong, China
Duration: 18 Dec 200622 Dec 2006

Other

Other6th International Conference on Data Mining, ICDM 2006
CountryChina
CityHong Kong
Period18/12/0622/12/06

Fingerprint

Agglomeration
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Verhein, F., & Chawla, S. (2006). Geometrically inspired itemset mining. In Proceedings - IEEE International Conference on Data Mining, ICDM (pp. 655-666). [4053091] https://doi.org/10.1109/ICDM.2006.75

Geometrically inspired itemset mining. / Verhein, Florian; Chawla, Sanjay.

Proceedings - IEEE International Conference on Data Mining, ICDM. 2006. p. 655-666 4053091.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Verhein, F & Chawla, S 2006, Geometrically inspired itemset mining. in Proceedings - IEEE International Conference on Data Mining, ICDM., 4053091, pp. 655-666, 6th International Conference on Data Mining, ICDM 2006, Hong Kong, China, 18/12/06. https://doi.org/10.1109/ICDM.2006.75
Verhein F, Chawla S. Geometrically inspired itemset mining. In Proceedings - IEEE International Conference on Data Mining, ICDM. 2006. p. 655-666. 4053091 https://doi.org/10.1109/ICDM.2006.75
Verhein, Florian ; Chawla, Sanjay. / Geometrically inspired itemset mining. Proceedings - IEEE International Conference on Data Mining, ICDM. 2006. pp. 655-666
@inproceedings{7f96bfd563e14f8a9878ad1f6817dced,
title = "Geometrically inspired itemset mining",
abstract = "In our geometric view, an itemset is a vector (Itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (o) and measures (f and F). For Frequent Itemset Mining (FIM), g and F are identity transformations, o is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FPGrowth on realistic datasets above a small support threshold (0.29{\%} and 1.2{\%} in our experiments)1.",
author = "Florian Verhein and Sanjay Chawla",
year = "2006",
doi = "10.1109/ICDM.2006.75",
language = "English",
isbn = "0769527019",
pages = "655--666",
booktitle = "Proceedings - IEEE International Conference on Data Mining, ICDM",

}

TY - GEN

T1 - Geometrically inspired itemset mining

AU - Verhein, Florian

AU - Chawla, Sanjay

PY - 2006

Y1 - 2006

N2 - In our geometric view, an itemset is a vector (Itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (o) and measures (f and F). For Frequent Itemset Mining (FIM), g and F are identity transformations, o is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FPGrowth on realistic datasets above a small support threshold (0.29% and 1.2% in our experiments)1.

AB - In our geometric view, an itemset is a vector (Itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (o) and measures (f and F). For Frequent Itemset Mining (FIM), g and F are identity transformations, o is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FPGrowth on realistic datasets above a small support threshold (0.29% and 1.2% in our experiments)1.

UR - http://www.scopus.com/inward/record.url?scp=49549103257&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49549103257&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2006.75

DO - 10.1109/ICDM.2006.75

M3 - Conference contribution

SN - 0769527019

SN - 9780769527017

SP - 655

EP - 666

BT - Proceedings - IEEE International Conference on Data Mining, ICDM

ER -