ANMAT: Automatic knowledge discovery and error detection through pattern functional dependencies

Abdulhakim Qahtan, Nan Tang, Mourad Ouzzani, Yang Cao, Michael Stonebraker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Knowledge discovery is critical to successful data analytics. We propose a new type of meta-knowledge, namely pattern functional dependencies (PFDs), that combine patterns (or regex-like rules) and integrity constraints (ICs) to model the dependencies (or meta-knowledge) between partial values (or patterns) across different attributes in a table. PFDs go beyond the classical functional dependencies and their extensions. For instance, in an employee table, ID “F-9-107”, “F” determines the finance department. Moreover, a key application of PFDs is to use them to identify erroneous data; tuples that violate some PFDs. In this demonstration, attendees will experience the following features: (i) PFD discovery - automatically discover PFDs from (dirty) data in different domains; and (ii) Error detection with PFDs - we will show errors that are detected by PFDs but cannot be captured by existing approaches.

Original languageEnglish
Title of host publicationSIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1977-1980
Number of pages4
ISBN (Electronic)9781450356435
DOIs
Publication statusPublished - 25 Jun 2019
Event2019 International Conference on Management of Data, SIGMOD 2019 - Amsterdam, Netherlands
Duration: 30 Jun 20195 Jul 2019

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2019 International Conference on Management of Data, SIGMOD 2019
CountryNetherlands
CityAmsterdam
Period30/6/195/7/19

Fingerprint

Error detection
Finance
Data mining
Demonstrations
Personnel

Keywords

  • Constrained Patterns
  • Error Detection
  • Knowledge Discovery
  • Pattern Functional Dependencies

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Qahtan, A., Tang, N., Ouzzani, M., Cao, Y., & Stonebraker, M. (2019). ANMAT: Automatic knowledge discovery and error detection through pattern functional dependencies. In SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data (pp. 1977-1980). (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/3299869.3320209

ANMAT : Automatic knowledge discovery and error detection through pattern functional dependencies. / Qahtan, Abdulhakim; Tang, Nan; Ouzzani, Mourad; Cao, Yang; Stonebraker, Michael.

SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Association for Computing Machinery, 2019. p. 1977-1980 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Qahtan, A, Tang, N, Ouzzani, M, Cao, Y & Stonebraker, M 2019, ANMAT: Automatic knowledge discovery and error detection through pattern functional dependencies. in SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, pp. 1977-1980, 2019 International Conference on Management of Data, SIGMOD 2019, Amsterdam, Netherlands, 30/6/19. https://doi.org/10.1145/3299869.3320209
Qahtan A, Tang N, Ouzzani M, Cao Y, Stonebraker M. ANMAT: Automatic knowledge discovery and error detection through pattern functional dependencies. In SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Association for Computing Machinery. 2019. p. 1977-1980. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/3299869.3320209
Qahtan, Abdulhakim ; Tang, Nan ; Ouzzani, Mourad ; Cao, Yang ; Stonebraker, Michael. / ANMAT : Automatic knowledge discovery and error detection through pattern functional dependencies. SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data. Association for Computing Machinery, 2019. pp. 1977-1980 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{b49a3042143947259a453b88b57746e8,
title = "ANMAT: Automatic knowledge discovery and error detection through pattern functional dependencies",
abstract = "Knowledge discovery is critical to successful data analytics. We propose a new type of meta-knowledge, namely pattern functional dependencies (PFDs), that combine patterns (or regex-like rules) and integrity constraints (ICs) to model the dependencies (or meta-knowledge) between partial values (or patterns) across different attributes in a table. PFDs go beyond the classical functional dependencies and their extensions. For instance, in an employee table, ID “F-9-107”, “F” determines the finance department. Moreover, a key application of PFDs is to use them to identify erroneous data; tuples that violate some PFDs. In this demonstration, attendees will experience the following features: (i) PFD discovery - automatically discover PFDs from (dirty) data in different domains; and (ii) Error detection with PFDs - we will show errors that are detected by PFDs but cannot be captured by existing approaches.",
keywords = "Constrained Patterns, Error Detection, Knowledge Discovery, Pattern Functional Dependencies",
author = "Abdulhakim Qahtan and Nan Tang and Mourad Ouzzani and Yang Cao and Michael Stonebraker",
year = "2019",
month = "6",
day = "25",
doi = "10.1145/3299869.3320209",
language = "English",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery",
pages = "1977--1980",
booktitle = "SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data",

}

TY - GEN

T1 - ANMAT

T2 - Automatic knowledge discovery and error detection through pattern functional dependencies

AU - Qahtan, Abdulhakim

AU - Tang, Nan

AU - Ouzzani, Mourad

AU - Cao, Yang

AU - Stonebraker, Michael

PY - 2019/6/25

Y1 - 2019/6/25

N2 - Knowledge discovery is critical to successful data analytics. We propose a new type of meta-knowledge, namely pattern functional dependencies (PFDs), that combine patterns (or regex-like rules) and integrity constraints (ICs) to model the dependencies (or meta-knowledge) between partial values (or patterns) across different attributes in a table. PFDs go beyond the classical functional dependencies and their extensions. For instance, in an employee table, ID “F-9-107”, “F” determines the finance department. Moreover, a key application of PFDs is to use them to identify erroneous data; tuples that violate some PFDs. In this demonstration, attendees will experience the following features: (i) PFD discovery - automatically discover PFDs from (dirty) data in different domains; and (ii) Error detection with PFDs - we will show errors that are detected by PFDs but cannot be captured by existing approaches.

AB - Knowledge discovery is critical to successful data analytics. We propose a new type of meta-knowledge, namely pattern functional dependencies (PFDs), that combine patterns (or regex-like rules) and integrity constraints (ICs) to model the dependencies (or meta-knowledge) between partial values (or patterns) across different attributes in a table. PFDs go beyond the classical functional dependencies and their extensions. For instance, in an employee table, ID “F-9-107”, “F” determines the finance department. Moreover, a key application of PFDs is to use them to identify erroneous data; tuples that violate some PFDs. In this demonstration, attendees will experience the following features: (i) PFD discovery - automatically discover PFDs from (dirty) data in different domains; and (ii) Error detection with PFDs - we will show errors that are detected by PFDs but cannot be captured by existing approaches.

KW - Constrained Patterns

KW - Error Detection

KW - Knowledge Discovery

KW - Pattern Functional Dependencies

UR - http://www.scopus.com/inward/record.url?scp=85069516123&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069516123&partnerID=8YFLogxK

U2 - 10.1145/3299869.3320209

DO - 10.1145/3299869.3320209

M3 - Conference contribution

AN - SCOPUS:85069516123

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 1977

EP - 1980

BT - SIGMOD 2019 - Proceedings of the 2019 International Conference on Management of Data

PB - Association for Computing Machinery

ER -