Building data civilizer pipelines with an advanced workflow engine

Essam Mansour, Dong Deng, Raul Castro Fernandez, Abdulhakim Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.

Original languageEnglish
Title of host publicationProceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1593-1596
Number of pages4
ISBN (Electronic)9781538655207
DOIs
Publication statusPublished - 24 Oct 2018
Event34th IEEE International Conference on Data Engineering, ICDE 2018 - Paris, France
Duration: 16 Apr 201819 Apr 2018

Other

Other34th IEEE International Conference on Data Engineering, ICDE 2018
CountryFrance
CityParis
Period16/4/1819/4/18

    Fingerprint

Keywords

  • Data Cleaning
  • Data Discovery
  • Data Integration

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management
  • Hardware and Architecture

Cite this

Mansour, E., Deng, D., Fernandez, R. C., Qahtan, A., Tao, W., Abedjan, Z., Elmagarmid, A., Ilyas, I. F., Madden, S., Ouzzani, M., Stonebraker, M., & Tang, N. (2018). Building data civilizer pipelines with an advanced workflow engine. In Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018 (pp. 1593-1596). [8509405] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDE.2018.00184