Quantification trees

Letizia Milli, Anna Monreale, Giulio Rossetti, Fosca Giannotti, Dino Pedreschi, Fabrizio Sebastiani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

In many applications there is a need to monitor how a population is distributed across different classes, and to track the changes in this distribution that derive from varying circumstances, an example such application is monitoring the percentage (or "prevalence") of unemployed people in a given region, or in a given age range, or at different time periods. When the membership of an individual in a class cannot be established deterministically, this monitoring activity requires classification. However, in the above applications the final goal is not determining which class each individual belongs to, but simply estimating the prevalence of each class in the unlabeled data. This task is called quantification. In a supervised learning framework we may estimate the distribution across the classes in a test set from a training set of labeled individuals. However, this may be sub optimal, since the distribution in the test set may be substantially different from that in the training set (a phenomenon called distribution drift). So far, quantification has mostly been addressed by learning a classifier optimized for individual classification and later adjusting the distribution it computes to compensate for its tendency to either under-or over-estimate the prevalence of the class. In this paper we propose instead to use a type of decision trees (quantification trees) optimized not for individual classification, but directly for quantification. Our experiments show that quantification trees are more accurate than existing state-of-the-art quantification methods, while retaining at the same time the simplicity and understandability of the decision tree framework.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Data Mining, ICDM
Pages528-536
Number of pages9
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event13th IEEE International Conference on Data Mining, ICDM 2013 - Dallas, TX, United States
Duration: 7 Dec 201310 Dec 2013

Other

Other13th IEEE International Conference on Data Mining, ICDM 2013
CountryUnited States
CityDallas, TX
Period7/12/1310/12/13

Fingerprint

Decision trees
Monitoring
Supervised learning
Classifiers
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D., & Sebastiani, F. (2013). Quantification trees. In Proceedings - IEEE International Conference on Data Mining, ICDM (pp. 528-536). [6729537] https://doi.org/10.1109/ICDM.2013.122

Quantification trees. / Milli, Letizia; Monreale, Anna; Rossetti, Giulio; Giannotti, Fosca; Pedreschi, Dino; Sebastiani, Fabrizio.

Proceedings - IEEE International Conference on Data Mining, ICDM. 2013. p. 528-536 6729537.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Milli, L, Monreale, A, Rossetti, G, Giannotti, F, Pedreschi, D & Sebastiani, F 2013, Quantification trees. in Proceedings - IEEE International Conference on Data Mining, ICDM., 6729537, pp. 528-536, 13th IEEE International Conference on Data Mining, ICDM 2013, Dallas, TX, United States, 7/12/13. https://doi.org/10.1109/ICDM.2013.122
Milli L, Monreale A, Rossetti G, Giannotti F, Pedreschi D, Sebastiani F. Quantification trees. In Proceedings - IEEE International Conference on Data Mining, ICDM. 2013. p. 528-536. 6729537 https://doi.org/10.1109/ICDM.2013.122
Milli, Letizia ; Monreale, Anna ; Rossetti, Giulio ; Giannotti, Fosca ; Pedreschi, Dino ; Sebastiani, Fabrizio. / Quantification trees. Proceedings - IEEE International Conference on Data Mining, ICDM. 2013. pp. 528-536
@inproceedings{dd0993d5240a45f0a6cccb41cb1c1f82,
title = "Quantification trees",
abstract = "In many applications there is a need to monitor how a population is distributed across different classes, and to track the changes in this distribution that derive from varying circumstances, an example such application is monitoring the percentage (or {"}prevalence{"}) of unemployed people in a given region, or in a given age range, or at different time periods. When the membership of an individual in a class cannot be established deterministically, this monitoring activity requires classification. However, in the above applications the final goal is not determining which class each individual belongs to, but simply estimating the prevalence of each class in the unlabeled data. This task is called quantification. In a supervised learning framework we may estimate the distribution across the classes in a test set from a training set of labeled individuals. However, this may be sub optimal, since the distribution in the test set may be substantially different from that in the training set (a phenomenon called distribution drift). So far, quantification has mostly been addressed by learning a classifier optimized for individual classification and later adjusting the distribution it computes to compensate for its tendency to either under-or over-estimate the prevalence of the class. In this paper we propose instead to use a type of decision trees (quantification trees) optimized not for individual classification, but directly for quantification. Our experiments show that quantification trees are more accurate than existing state-of-the-art quantification methods, while retaining at the same time the simplicity and understandability of the decision tree framework.",
author = "Letizia Milli and Anna Monreale and Giulio Rossetti and Fosca Giannotti and Dino Pedreschi and Fabrizio Sebastiani",
year = "2013",
doi = "10.1109/ICDM.2013.122",
language = "English",
pages = "528--536",
booktitle = "Proceedings - IEEE International Conference on Data Mining, ICDM",

}

TY - GEN

T1 - Quantification trees

AU - Milli, Letizia

AU - Monreale, Anna

AU - Rossetti, Giulio

AU - Giannotti, Fosca

AU - Pedreschi, Dino

AU - Sebastiani, Fabrizio

PY - 2013

Y1 - 2013

N2 - In many applications there is a need to monitor how a population is distributed across different classes, and to track the changes in this distribution that derive from varying circumstances, an example such application is monitoring the percentage (or "prevalence") of unemployed people in a given region, or in a given age range, or at different time periods. When the membership of an individual in a class cannot be established deterministically, this monitoring activity requires classification. However, in the above applications the final goal is not determining which class each individual belongs to, but simply estimating the prevalence of each class in the unlabeled data. This task is called quantification. In a supervised learning framework we may estimate the distribution across the classes in a test set from a training set of labeled individuals. However, this may be sub optimal, since the distribution in the test set may be substantially different from that in the training set (a phenomenon called distribution drift). So far, quantification has mostly been addressed by learning a classifier optimized for individual classification and later adjusting the distribution it computes to compensate for its tendency to either under-or over-estimate the prevalence of the class. In this paper we propose instead to use a type of decision trees (quantification trees) optimized not for individual classification, but directly for quantification. Our experiments show that quantification trees are more accurate than existing state-of-the-art quantification methods, while retaining at the same time the simplicity and understandability of the decision tree framework.

AB - In many applications there is a need to monitor how a population is distributed across different classes, and to track the changes in this distribution that derive from varying circumstances, an example such application is monitoring the percentage (or "prevalence") of unemployed people in a given region, or in a given age range, or at different time periods. When the membership of an individual in a class cannot be established deterministically, this monitoring activity requires classification. However, in the above applications the final goal is not determining which class each individual belongs to, but simply estimating the prevalence of each class in the unlabeled data. This task is called quantification. In a supervised learning framework we may estimate the distribution across the classes in a test set from a training set of labeled individuals. However, this may be sub optimal, since the distribution in the test set may be substantially different from that in the training set (a phenomenon called distribution drift). So far, quantification has mostly been addressed by learning a classifier optimized for individual classification and later adjusting the distribution it computes to compensate for its tendency to either under-or over-estimate the prevalence of the class. In this paper we propose instead to use a type of decision trees (quantification trees) optimized not for individual classification, but directly for quantification. Our experiments show that quantification trees are more accurate than existing state-of-the-art quantification methods, while retaining at the same time the simplicity and understandability of the decision tree framework.

UR - http://www.scopus.com/inward/record.url?scp=84894684647&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894684647&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2013.122

DO - 10.1109/ICDM.2013.122

M3 - Conference contribution

SP - 528

EP - 536

BT - Proceedings - IEEE International Conference on Data Mining, ICDM

ER -