Tracing data pollution in large business applications

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In large business applications, various data processing activities can be done locally or outsourced, split or combined and the resulting data flows have to be exchanged, shared or integrated from multiple data processing units. There are indeed various alternative paths for data processing and data consolidation. But some data flows and data processing applications are most likely exposed to generating and propagating data errors; some of them are more critical too. Actually, we usually ignore the impact of data errors in large and complex business applications because: 1) it is often very difficult to systematically audit data, detect and trace data errors in such large applications, 2) we usually don't have the complete picture of all the data processing units involved in every data processing paths; they are viewed as black-boxes, and 3) we usually ignore the total cost of detecting and eliminating data anomalies and surprisingly we also ignore the cost of "doing nothing" to resolve them. In this paper, the objectives of our ongoing research are the following: to propose a probabilistic model reflecting data error propagation in large business applications, to determine the most critical or impacted data processing paths and their weak points or vulnerabilities in terms of data quality, to advocate adequate locations for data quality checkpoints, and to predict the cost of doing-nothing versus the cost of data cleaning activities.

Original languageEnglish
Title of host publicationProceedings of the 2008 International Conference on Information Quality, ICIQ 2008
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event13th International Conference on Information Quality, ICIQ 2008 - Cambridge, MA, United States
Duration: 14 Nov 200816 Nov 2008

Other

Other13th International Conference on Information Quality, ICIQ 2008
CountryUnited States
CityCambridge, MA
Period14/11/0816/11/08

Fingerprint

Pollution
Industry
Costs
Consolidation
Cleaning

ASJC Scopus subject areas

  • Information Systems
  • Safety, Risk, Reliability and Quality

Cite this

Berti-Equille, L. (2008). Tracing data pollution in large business applications. In Proceedings of the 2008 International Conference on Information Quality, ICIQ 2008

Tracing data pollution in large business applications. / Berti-Equille, Laure.

Proceedings of the 2008 International Conference on Information Quality, ICIQ 2008. 2008.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Berti-Equille, L 2008, Tracing data pollution in large business applications. in Proceedings of the 2008 International Conference on Information Quality, ICIQ 2008. 13th International Conference on Information Quality, ICIQ 2008, Cambridge, MA, United States, 14/11/08.
Berti-Equille L. Tracing data pollution in large business applications. In Proceedings of the 2008 International Conference on Information Quality, ICIQ 2008. 2008
Berti-Equille, Laure. / Tracing data pollution in large business applications. Proceedings of the 2008 International Conference on Information Quality, ICIQ 2008. 2008.
@inproceedings{3078f22959cf4ded84d074f36db9fc4d,
title = "Tracing data pollution in large business applications",
abstract = "In large business applications, various data processing activities can be done locally or outsourced, split or combined and the resulting data flows have to be exchanged, shared or integrated from multiple data processing units. There are indeed various alternative paths for data processing and data consolidation. But some data flows and data processing applications are most likely exposed to generating and propagating data errors; some of them are more critical too. Actually, we usually ignore the impact of data errors in large and complex business applications because: 1) it is often very difficult to systematically audit data, detect and trace data errors in such large applications, 2) we usually don't have the complete picture of all the data processing units involved in every data processing paths; they are viewed as black-boxes, and 3) we usually ignore the total cost of detecting and eliminating data anomalies and surprisingly we also ignore the cost of {"}doing nothing{"} to resolve them. In this paper, the objectives of our ongoing research are the following: to propose a probabilistic model reflecting data error propagation in large business applications, to determine the most critical or impacted data processing paths and their weak points or vulnerabilities in terms of data quality, to advocate adequate locations for data quality checkpoints, and to predict the cost of doing-nothing versus the cost of data cleaning activities.",
author = "Laure Berti-Equille",
year = "2008",
month = "12",
day = "1",
language = "English",
booktitle = "Proceedings of the 2008 International Conference on Information Quality, ICIQ 2008",

}

TY - GEN

T1 - Tracing data pollution in large business applications

AU - Berti-Equille, Laure

PY - 2008/12/1

Y1 - 2008/12/1

N2 - In large business applications, various data processing activities can be done locally or outsourced, split or combined and the resulting data flows have to be exchanged, shared or integrated from multiple data processing units. There are indeed various alternative paths for data processing and data consolidation. But some data flows and data processing applications are most likely exposed to generating and propagating data errors; some of them are more critical too. Actually, we usually ignore the impact of data errors in large and complex business applications because: 1) it is often very difficult to systematically audit data, detect and trace data errors in such large applications, 2) we usually don't have the complete picture of all the data processing units involved in every data processing paths; they are viewed as black-boxes, and 3) we usually ignore the total cost of detecting and eliminating data anomalies and surprisingly we also ignore the cost of "doing nothing" to resolve them. In this paper, the objectives of our ongoing research are the following: to propose a probabilistic model reflecting data error propagation in large business applications, to determine the most critical or impacted data processing paths and their weak points or vulnerabilities in terms of data quality, to advocate adequate locations for data quality checkpoints, and to predict the cost of doing-nothing versus the cost of data cleaning activities.

AB - In large business applications, various data processing activities can be done locally or outsourced, split or combined and the resulting data flows have to be exchanged, shared or integrated from multiple data processing units. There are indeed various alternative paths for data processing and data consolidation. But some data flows and data processing applications are most likely exposed to generating and propagating data errors; some of them are more critical too. Actually, we usually ignore the impact of data errors in large and complex business applications because: 1) it is often very difficult to systematically audit data, detect and trace data errors in such large applications, 2) we usually don't have the complete picture of all the data processing units involved in every data processing paths; they are viewed as black-boxes, and 3) we usually ignore the total cost of detecting and eliminating data anomalies and surprisingly we also ignore the cost of "doing nothing" to resolve them. In this paper, the objectives of our ongoing research are the following: to propose a probabilistic model reflecting data error propagation in large business applications, to determine the most critical or impacted data processing paths and their weak points or vulnerabilities in terms of data quality, to advocate adequate locations for data quality checkpoints, and to predict the cost of doing-nothing versus the cost of data cleaning activities.

UR - http://www.scopus.com/inward/record.url?scp=84871579330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871579330&partnerID=8YFLogxK

M3 - Conference contribution

BT - Proceedings of the 2008 International Conference on Information Quality, ICIQ 2008

ER -