A demo of the data civilizer system

Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Finding relevant data for a specific task from the numerous data sources available in any organization is a daunting task. This is not only because of the number of possible data sources where the data of interest resides, but also due to the data being scattered all over the enterprise and being typically dirty and inconsistent. In practice, data scientists are routinely reporting that the majority (more than 80%) of their effort is spent finding, cleaning, integrating, and accessing data of interest to a task at hand. We propose to demonstrate Data Civilizer to ease the pain faced in analyzing data "in the wild". Data Civilizer is an end-to-end big data management system with components for data discovery, data integration and stitching, data cleaning, and querying data from a large variety of storage engines, running in large enterprises.

Original languageEnglish
Title of host publicationSIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1639-1642
Number of pages4
VolumePart F127746
ISBN (Electronic)9781450341974
DOIs
Publication statusPublished - 9 May 2017
Event2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 - Chicago, United States
Duration: 14 May 201719 May 2017

Other

Other2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
CountryUnited States
CityChicago
Period14/5/1719/5/17

Fingerprint

Cleaning
Data integration
Information management
Industry
Engines
Big data

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Fernandez, R. C., Deng, D., Mansour, E., Qahtan, A., Tao, W., Abedjan, Z., ... Tang, N. (2017). A demo of the data civilizer system. In SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data (Vol. Part F127746, pp. 1639-1642). Association for Computing Machinery. https://doi.org/10.1145/3035918.3058740

A demo of the data civilizer system. / Fernandez, Raul Castro; Deng, Dong; Mansour, Essam; Qahtan, Abdulhakim; Tao, Wenbo; Abedjan, Ziawasch; Elmagarmid, Ahmed; Ilyas, Ihab F.; Madden, Samuel; Ouzzani, Mourad; Stonebraker, Michael; Tang, Nan.

SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Vol. Part F127746 Association for Computing Machinery, 2017. p. 1639-1642.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fernandez, RC, Deng, D, Mansour, E, Qahtan, A, Tao, W, Abedjan, Z, Elmagarmid, A, Ilyas, IF, Madden, S, Ouzzani, M, Stonebraker, M & Tang, N 2017, A demo of the data civilizer system. in SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. vol. Part F127746, Association for Computing Machinery, pp. 1639-1642, 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017, Chicago, United States, 14/5/17. https://doi.org/10.1145/3035918.3058740
Fernandez RC, Deng D, Mansour E, Qahtan A, Tao W, Abedjan Z et al. A demo of the data civilizer system. In SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Vol. Part F127746. Association for Computing Machinery. 2017. p. 1639-1642 https://doi.org/10.1145/3035918.3058740
Fernandez, Raul Castro ; Deng, Dong ; Mansour, Essam ; Qahtan, Abdulhakim ; Tao, Wenbo ; Abedjan, Ziawasch ; Elmagarmid, Ahmed ; Ilyas, Ihab F. ; Madden, Samuel ; Ouzzani, Mourad ; Stonebraker, Michael ; Tang, Nan. / A demo of the data civilizer system. SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Vol. Part F127746 Association for Computing Machinery, 2017. pp. 1639-1642
@inproceedings{ebe3fad08aab4c278ccd6d3e3f6a7b9b,
title = "A demo of the data civilizer system",
abstract = "Finding relevant data for a specific task from the numerous data sources available in any organization is a daunting task. This is not only because of the number of possible data sources where the data of interest resides, but also due to the data being scattered all over the enterprise and being typically dirty and inconsistent. In practice, data scientists are routinely reporting that the majority (more than 80{\%}) of their effort is spent finding, cleaning, integrating, and accessing data of interest to a task at hand. We propose to demonstrate Data Civilizer to ease the pain faced in analyzing data {"}in the wild{"}. Data Civilizer is an end-to-end big data management system with components for data discovery, data integration and stitching, data cleaning, and querying data from a large variety of storage engines, running in large enterprises.",
author = "Fernandez, {Raul Castro} and Dong Deng and Essam Mansour and Abdulhakim Qahtan and Wenbo Tao and Ziawasch Abedjan and Ahmed Elmagarmid and Ilyas, {Ihab F.} and Samuel Madden and Mourad Ouzzani and Michael Stonebraker and Nan Tang",
year = "2017",
month = "5",
day = "9",
doi = "10.1145/3035918.3058740",
language = "English",
volume = "Part F127746",
pages = "1639--1642",
booktitle = "SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - A demo of the data civilizer system

AU - Fernandez, Raul Castro

AU - Deng, Dong

AU - Mansour, Essam

AU - Qahtan, Abdulhakim

AU - Tao, Wenbo

AU - Abedjan, Ziawasch

AU - Elmagarmid, Ahmed

AU - Ilyas, Ihab F.

AU - Madden, Samuel

AU - Ouzzani, Mourad

AU - Stonebraker, Michael

AU - Tang, Nan

PY - 2017/5/9

Y1 - 2017/5/9

N2 - Finding relevant data for a specific task from the numerous data sources available in any organization is a daunting task. This is not only because of the number of possible data sources where the data of interest resides, but also due to the data being scattered all over the enterprise and being typically dirty and inconsistent. In practice, data scientists are routinely reporting that the majority (more than 80%) of their effort is spent finding, cleaning, integrating, and accessing data of interest to a task at hand. We propose to demonstrate Data Civilizer to ease the pain faced in analyzing data "in the wild". Data Civilizer is an end-to-end big data management system with components for data discovery, data integration and stitching, data cleaning, and querying data from a large variety of storage engines, running in large enterprises.

AB - Finding relevant data for a specific task from the numerous data sources available in any organization is a daunting task. This is not only because of the number of possible data sources where the data of interest resides, but also due to the data being scattered all over the enterprise and being typically dirty and inconsistent. In practice, data scientists are routinely reporting that the majority (more than 80%) of their effort is spent finding, cleaning, integrating, and accessing data of interest to a task at hand. We propose to demonstrate Data Civilizer to ease the pain faced in analyzing data "in the wild". Data Civilizer is an end-to-end big data management system with components for data discovery, data integration and stitching, data cleaning, and querying data from a large variety of storage engines, running in large enterprises.

UR - http://www.scopus.com/inward/record.url?scp=85021185176&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021185176&partnerID=8YFLogxK

U2 - 10.1145/3035918.3058740

DO - 10.1145/3035918.3058740

M3 - Conference contribution

VL - Part F127746

SP - 1639

EP - 1642

BT - SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data

PB - Association for Computing Machinery

ER -