Building data civilizer pipelines with an advanced workflow engine

Essam Mansour, Dong Deng, Raul Castro Fernandez, Abdulhakim Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.

Original languageEnglish
Title of host publicationProceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1593-1596
Number of pages4
ISBN (Electronic)9781538655207
DOIs
Publication statusPublished - 24 Oct 2018
Event34th IEEE International Conference on Data Engineering, ICDE 2018 - Paris, France
Duration: 16 Apr 201819 Apr 2018

Other

Other34th IEEE International Conference on Data Engineering, ICDE 2018
CountryFrance
CityParis
Period16/4/1819/4/18

Fingerprint

Cleaning
Pipelines
Engines
Industry
Data warehouses
Consolidation
Preparation
Geographic regions
An enterprise
Electronic commerce
Data warehouse
Scenarios

Keywords

  • Data Cleaning
  • Data Discovery
  • Data Integration

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management
  • Hardware and Architecture

Cite this

Mansour, E., Deng, D., Fernandez, R. C., Qahtan, A., Tao, W., Abedjan, Z., ... Tang, N. (2018). Building data civilizer pipelines with an advanced workflow engine. In Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018 (pp. 1593-1596). [8509405] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDE.2018.00184

Building data civilizer pipelines with an advanced workflow engine. / Mansour, Essam; Deng, Dong; Fernandez, Raul Castro; Qahtan, Abdulhakim; Tao, Wenbo; Abedjan, Ziawasch; Elmagarmid, Ahmed; Ilyas, Ihab F.; Madden, Samuel; Ouzzani, Mourad; Stonebraker, Michael; Tang, Nan.

Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1593-1596 8509405.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mansour, E, Deng, D, Fernandez, RC, Qahtan, A, Tao, W, Abedjan, Z, Elmagarmid, A, Ilyas, IF, Madden, S, Ouzzani, M, Stonebraker, M & Tang, N 2018, Building data civilizer pipelines with an advanced workflow engine. in Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018., 8509405, Institute of Electrical and Electronics Engineers Inc., pp. 1593-1596, 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, 16/4/18. https://doi.org/10.1109/ICDE.2018.00184
Mansour E, Deng D, Fernandez RC, Qahtan A, Tao W, Abedjan Z et al. Building data civilizer pipelines with an advanced workflow engine. In Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1593-1596. 8509405 https://doi.org/10.1109/ICDE.2018.00184
Mansour, Essam ; Deng, Dong ; Fernandez, Raul Castro ; Qahtan, Abdulhakim ; Tao, Wenbo ; Abedjan, Ziawasch ; Elmagarmid, Ahmed ; Ilyas, Ihab F. ; Madden, Samuel ; Ouzzani, Mourad ; Stonebraker, Michael ; Tang, Nan. / Building data civilizer pipelines with an advanced workflow engine. Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1593-1596
@inproceedings{2af44dabb8b340fc9bbf202650ded11e,
title = "Building data civilizer pipelines with an advanced workflow engine",
abstract = "In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.",
keywords = "Data Cleaning, Data Discovery, Data Integration",
author = "Essam Mansour and Dong Deng and Fernandez, {Raul Castro} and Abdulhakim Qahtan and Wenbo Tao and Ziawasch Abedjan and Ahmed Elmagarmid and Ilyas, {Ihab F.} and Samuel Madden and Mourad Ouzzani and Michael Stonebraker and Nan Tang",
year = "2018",
month = "10",
day = "24",
doi = "10.1109/ICDE.2018.00184",
language = "English",
pages = "1593--1596",
booktitle = "Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Building data civilizer pipelines with an advanced workflow engine

AU - Mansour, Essam

AU - Deng, Dong

AU - Fernandez, Raul Castro

AU - Qahtan, Abdulhakim

AU - Tao, Wenbo

AU - Abedjan, Ziawasch

AU - Elmagarmid, Ahmed

AU - Ilyas, Ihab F.

AU - Madden, Samuel

AU - Ouzzani, Mourad

AU - Stonebraker, Michael

AU - Tang, Nan

PY - 2018/10/24

Y1 - 2018/10/24

N2 - In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.

AB - In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.

KW - Data Cleaning

KW - Data Discovery

KW - Data Integration

UR - http://www.scopus.com/inward/record.url?scp=85057102546&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057102546&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2018.00184

DO - 10.1109/ICDE.2018.00184

M3 - Conference contribution

AN - SCOPUS:85057102546

SP - 1593

EP - 1596

BT - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -