Efficient construction of approximate Ad-Hoc Ml models through materialization and reuse

Sona Hasani, Saravanan Thirumuruganathan, Abolfazl Asudeh, Nick Koudas, Gautam Das

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse. For example, is it possible to construct an approximate ML model for data from the year 2017 if one already has ML models for each of its quarters? We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We propose a cost based optimization framework that identifies appropriate ML models to combine at query time and conduct extensive experiments on real-world and synthetic datasets. Our results indicate that our framework can support analytic queries on ML models, with superior performance, achieving dramatic speedups of several orders in magnitude on very large datasets.

Original languageEnglish
Pages (from-to)1468-1481
Number of pages14
JournalProceedings of the VLDB Endowment
Volume11
Issue number11
DOIs
Publication statusPublished - 1 Jan 2017
Event44th International Conference on Very Large Data Bases, VLDB 2018 - Rio de Janeiro, Brazil
Duration: 27 Aug 201731 Aug 2017

Fingerprint

Data warehouses
Learning systems
Processing
Costs
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

Efficient construction of approximate Ad-Hoc Ml models through materialization and reuse. / Hasani, Sona; Thirumuruganathan, Saravanan; Asudeh, Abolfazl; Koudas, Nick; Das, Gautam.

In: Proceedings of the VLDB Endowment, Vol. 11, No. 11, 01.01.2017, p. 1468-1481.

Research output: Contribution to journalConference article

Hasani, Sona ; Thirumuruganathan, Saravanan ; Asudeh, Abolfazl ; Koudas, Nick ; Das, Gautam. / Efficient construction of approximate Ad-Hoc Ml models through materialization and reuse. In: Proceedings of the VLDB Endowment. 2017 ; Vol. 11, No. 11. pp. 1468-1481.
@article{0a6445639e394bf19ff3b939dab12e2c,
title = "Efficient construction of approximate Ad-Hoc Ml models through materialization and reuse",
abstract = "Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse. For example, is it possible to construct an approximate ML model for data from the year 2017 if one already has ML models for each of its quarters? We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We propose a cost based optimization framework that identifies appropriate ML models to combine at query time and conduct extensive experiments on real-world and synthetic datasets. Our results indicate that our framework can support analytic queries on ML models, with superior performance, achieving dramatic speedups of several orders in magnitude on very large datasets.",
author = "Sona Hasani and Saravanan Thirumuruganathan and Abolfazl Asudeh and Nick Koudas and Gautam Das",
year = "2017",
month = "1",
day = "1",
doi = "10.14778/3236187.3236199",
language = "English",
volume = "11",
pages = "1468--1481",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "11",

}

TY - JOUR

T1 - Efficient construction of approximate Ad-Hoc Ml models through materialization and reuse

AU - Hasani, Sona

AU - Thirumuruganathan, Saravanan

AU - Asudeh, Abolfazl

AU - Koudas, Nick

AU - Das, Gautam

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse. For example, is it possible to construct an approximate ML model for data from the year 2017 if one already has ML models for each of its quarters? We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We propose a cost based optimization framework that identifies appropriate ML models to combine at query time and conduct extensive experiments on real-world and synthetic datasets. Our results indicate that our framework can support analytic queries on ML models, with superior performance, achieving dramatic speedups of several orders in magnitude on very large datasets.

AB - Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse. For example, is it possible to construct an approximate ML model for data from the year 2017 if one already has ML models for each of its quarters? We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We propose a cost based optimization framework that identifies appropriate ML models to combine at query time and conduct extensive experiments on real-world and synthetic datasets. Our results indicate that our framework can support analytic queries on ML models, with superior performance, achieving dramatic speedups of several orders in magnitude on very large datasets.

UR - http://www.scopus.com/inward/record.url?scp=85058900008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058900008&partnerID=8YFLogxK

U2 - 10.14778/3236187.3236199

DO - 10.14778/3236187.3236199

M3 - Conference article

AN - SCOPUS:85058900008

VL - 11

SP - 1468

EP - 1481

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 11

ER -