Explaining entity resolution predictions: Where are we and what needs to be done?

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Entity resolution (ER) seeks to identify the set of tuples in a dataset that refer to the same real-world entity. It is one of the fundamental and well studied problems in data integration with applications in diverse domains such as banking, insurance, e-commerce, and so on. Machine Learning and Deep Learning based methods provide the state-of-the-art results. For practitioners, it is often challenging to understand why the classifier made a particular prediction. While there has been extensive work in the ML community on explaining classifier predictions, we found that a direct application of those techniques is not appropriate for ER. There is a huge gap between the needs of lay ER practitioners and the explanation community. In this paper, we provide a comprehensive taxonomy of these challenges, discuss research opportunities and propose preliminary solutions.

Original languageEnglish
Title of host publicationProceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450367912
DOIs
Publication statusPublished - 5 Jul 2019
Event2019 Workshop on Human-In-the-Loop Data Analytics, HILDA 2019, co-located with SIGMOD 2019 - Amsterdam, Netherlands
Duration: 5 Jul 2019 → …

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2019 Workshop on Human-In-the-Loop Data Analytics, HILDA 2019, co-located with SIGMOD 2019
CountryNetherlands
CityAmsterdam
Period5/7/19 → …

Fingerprint

Classifiers
Data integration
Insurance
Taxonomies
Learning systems
Deep learning

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Thirumuruganathan, S., Ouzzani, M., & Tang, N. (2019). Explaining entity resolution predictions: Where are we and what needs to be done? In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019 [a10] (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/3328519.3329130

Explaining entity resolution predictions : Where are we and what needs to be done? / Thirumuruganathan, Saravanan; Ouzzani, Mourad; Tang, Nan.

Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019. Association for Computing Machinery, 2019. a10 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Thirumuruganathan, S, Ouzzani, M & Tang, N 2019, Explaining entity resolution predictions: Where are we and what needs to be done? in Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019., a10, Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, 2019 Workshop on Human-In-the-Loop Data Analytics, HILDA 2019, co-located with SIGMOD 2019, Amsterdam, Netherlands, 5/7/19. https://doi.org/10.1145/3328519.3329130
Thirumuruganathan S, Ouzzani M, Tang N. Explaining entity resolution predictions: Where are we and what needs to be done? In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019. Association for Computing Machinery. 2019. a10. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/3328519.3329130
Thirumuruganathan, Saravanan ; Ouzzani, Mourad ; Tang, Nan. / Explaining entity resolution predictions : Where are we and what needs to be done?. Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019. Association for Computing Machinery, 2019. (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{423524ea65a5406bb126225b28edf4f1,
title = "Explaining entity resolution predictions: Where are we and what needs to be done?",
abstract = "Entity resolution (ER) seeks to identify the set of tuples in a dataset that refer to the same real-world entity. It is one of the fundamental and well studied problems in data integration with applications in diverse domains such as banking, insurance, e-commerce, and so on. Machine Learning and Deep Learning based methods provide the state-of-the-art results. For practitioners, it is often challenging to understand why the classifier made a particular prediction. While there has been extensive work in the ML community on explaining classifier predictions, we found that a direct application of those techniques is not appropriate for ER. There is a huge gap between the needs of lay ER practitioners and the explanation community. In this paper, we provide a comprehensive taxonomy of these challenges, discuss research opportunities and propose preliminary solutions.",
author = "Saravanan Thirumuruganathan and Mourad Ouzzani and Nan Tang",
year = "2019",
month = "7",
day = "5",
doi = "10.1145/3328519.3329130",
language = "English",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019",

}

TY - GEN

T1 - Explaining entity resolution predictions

T2 - Where are we and what needs to be done?

AU - Thirumuruganathan, Saravanan

AU - Ouzzani, Mourad

AU - Tang, Nan

PY - 2019/7/5

Y1 - 2019/7/5

N2 - Entity resolution (ER) seeks to identify the set of tuples in a dataset that refer to the same real-world entity. It is one of the fundamental and well studied problems in data integration with applications in diverse domains such as banking, insurance, e-commerce, and so on. Machine Learning and Deep Learning based methods provide the state-of-the-art results. For practitioners, it is often challenging to understand why the classifier made a particular prediction. While there has been extensive work in the ML community on explaining classifier predictions, we found that a direct application of those techniques is not appropriate for ER. There is a huge gap between the needs of lay ER practitioners and the explanation community. In this paper, we provide a comprehensive taxonomy of these challenges, discuss research opportunities and propose preliminary solutions.

AB - Entity resolution (ER) seeks to identify the set of tuples in a dataset that refer to the same real-world entity. It is one of the fundamental and well studied problems in data integration with applications in diverse domains such as banking, insurance, e-commerce, and so on. Machine Learning and Deep Learning based methods provide the state-of-the-art results. For practitioners, it is often challenging to understand why the classifier made a particular prediction. While there has been extensive work in the ML community on explaining classifier predictions, we found that a direct application of those techniques is not appropriate for ER. There is a huge gap between the needs of lay ER practitioners and the explanation community. In this paper, we provide a comprehensive taxonomy of these challenges, discuss research opportunities and propose preliminary solutions.

UR - http://www.scopus.com/inward/record.url?scp=85072805005&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072805005&partnerID=8YFLogxK

U2 - 10.1145/3328519.3329130

DO - 10.1145/3328519.3329130

M3 - Conference contribution

AN - SCOPUS:85072805005

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

BT - Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2019

PB - Association for Computing Machinery

ER -