Data veracity estimation with ensembling truth discovery methods

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Estimation of data veracity is recognized as one of the grand challenges of big data. Typically, the goal of truth discovery is to determine the veracity of multi-source, conflicting data and return, as outputs, a veracity label and a confidence score for each data value, along with the trustworthiness score of each source claiming it. Although a plethora of methods has been proposed, it is unlikely a technique dominates all others across all data sets. Furthermore, the performance evaluation of the methods entirely depends on the availability of labeled ground truth data (i.e., data whose veracity has been manually checked). In the context of Big Data, acquiring the complete ground truth data is out-of-reach. In this paper, we propose an ensembling method that mitigates the two problems of method selection and ground truth data sparsity. Our approach combines the results of a set of truth discovery methods and preliminary experiments suggest that it improves the quality performance over the single methods when samples of ground truth data are used.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2628-2636
Number of pages9
ISBN (Print)9781479999255
DOIs
Publication statusPublished - 22 Dec 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: 29 Oct 20151 Nov 2015

Other

Other3rd IEEE International Conference on Big Data, IEEE Big Data 2015
CountryUnited States
CitySanta Clara
Period29/10/151/11/15

Fingerprint

Labels
Availability
Experiments
Big data

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Berti-Equille, L. (2015). Data veracity estimation with ensembling truth discovery methods. In Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 2628-2636). [7364062] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2015.7364062

Data veracity estimation with ensembling truth discovery methods. / Berti-Equille, Laure.

Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 2628-2636 7364062.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Berti-Equille, L 2015, Data veracity estimation with ensembling truth discovery methods. in Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015., 7364062, Institute of Electrical and Electronics Engineers Inc., pp. 2628-2636, 3rd IEEE International Conference on Big Data, IEEE Big Data 2015, Santa Clara, United States, 29/10/15. https://doi.org/10.1109/BigData.2015.7364062
Berti-Equille L. Data veracity estimation with ensembling truth discovery methods. In Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 2628-2636. 7364062 https://doi.org/10.1109/BigData.2015.7364062
Berti-Equille, Laure. / Data veracity estimation with ensembling truth discovery methods. Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 2628-2636
@inproceedings{fa48e4dacf204539b1b9d79d45fda4d6,
title = "Data veracity estimation with ensembling truth discovery methods",
abstract = "Estimation of data veracity is recognized as one of the grand challenges of big data. Typically, the goal of truth discovery is to determine the veracity of multi-source, conflicting data and return, as outputs, a veracity label and a confidence score for each data value, along with the trustworthiness score of each source claiming it. Although a plethora of methods has been proposed, it is unlikely a technique dominates all others across all data sets. Furthermore, the performance evaluation of the methods entirely depends on the availability of labeled ground truth data (i.e., data whose veracity has been manually checked). In the context of Big Data, acquiring the complete ground truth data is out-of-reach. In this paper, we propose an ensembling method that mitigates the two problems of method selection and ground truth data sparsity. Our approach combines the results of a set of truth discovery methods and preliminary experiments suggest that it improves the quality performance over the single methods when samples of ground truth data are used.",
author = "Laure Berti-Equille",
year = "2015",
month = "12",
day = "22",
doi = "10.1109/BigData.2015.7364062",
language = "English",
isbn = "9781479999255",
pages = "2628--2636",
booktitle = "Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Data veracity estimation with ensembling truth discovery methods

AU - Berti-Equille, Laure

PY - 2015/12/22

Y1 - 2015/12/22

N2 - Estimation of data veracity is recognized as one of the grand challenges of big data. Typically, the goal of truth discovery is to determine the veracity of multi-source, conflicting data and return, as outputs, a veracity label and a confidence score for each data value, along with the trustworthiness score of each source claiming it. Although a plethora of methods has been proposed, it is unlikely a technique dominates all others across all data sets. Furthermore, the performance evaluation of the methods entirely depends on the availability of labeled ground truth data (i.e., data whose veracity has been manually checked). In the context of Big Data, acquiring the complete ground truth data is out-of-reach. In this paper, we propose an ensembling method that mitigates the two problems of method selection and ground truth data sparsity. Our approach combines the results of a set of truth discovery methods and preliminary experiments suggest that it improves the quality performance over the single methods when samples of ground truth data are used.

AB - Estimation of data veracity is recognized as one of the grand challenges of big data. Typically, the goal of truth discovery is to determine the veracity of multi-source, conflicting data and return, as outputs, a veracity label and a confidence score for each data value, along with the trustworthiness score of each source claiming it. Although a plethora of methods has been proposed, it is unlikely a technique dominates all others across all data sets. Furthermore, the performance evaluation of the methods entirely depends on the availability of labeled ground truth data (i.e., data whose veracity has been manually checked). In the context of Big Data, acquiring the complete ground truth data is out-of-reach. In this paper, we propose an ensembling method that mitigates the two problems of method selection and ground truth data sparsity. Our approach combines the results of a set of truth discovery methods and preliminary experiments suggest that it improves the quality performance over the single methods when samples of ground truth data are used.

UR - http://www.scopus.com/inward/record.url?scp=84963767154&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963767154&partnerID=8YFLogxK

U2 - 10.1109/BigData.2015.7364062

DO - 10.1109/BigData.2015.7364062

M3 - Conference contribution

SN - 9781479999255

SP - 2628

EP - 2636

BT - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -