Tracing data pollution in large business applications

Research output: Contribution to conferencePaper

1 Citation (Scopus)


In large business applications, various data processing activities can be done locally or outsourced, split or combined and the resulting data flows have to be exchanged, shared or integrated from multiple data processing units. There are indeed various alternative paths for data processing and data consolidation. But some data flows and data processing applications are most likely exposed to generating and propagating data errors; some of them are more critical too. Actually, we usually ignore the impact of data errors in large and complex business applications because: 1) it is often very difficult to systematically audit data, detect and trace data errors in such large applications, 2) we usually don't have the complete picture of all the data processing units involved in every data processing paths; they are viewed as black-boxes, and 3) we usually ignore the total cost of detecting and eliminating data anomalies and surprisingly we also ignore the cost of "doing nothing" to resolve them. In this paper, the objectives of our ongoing research are the following: to propose a probabilistic model reflecting data error propagation in large business applications, to determine the most critical or impacted data processing paths and their weak points or vulnerabilities in terms of data quality, to advocate adequate locations for data quality checkpoints, and to predict the cost of doing-nothing versus the cost of data cleaning activities.

Original languageEnglish
Publication statusPublished - 1 Dec 2008
Event13th International Conference on Information Quality, ICIQ 2008 - Cambridge, MA, United States
Duration: 14 Nov 200816 Nov 2008


Other13th International Conference on Information Quality, ICIQ 2008
CountryUnited States
CityCambridge, MA


ASJC Scopus subject areas

  • Information Systems
  • Safety, Risk, Reliability and Quality

Cite this

Berti-Équille, L. (2008). Tracing data pollution in large business applications. Paper presented at 13th International Conference on Information Quality, ICIQ 2008, Cambridge, MA, United States.