DataXFormer: Leveraging the web for semantic transformations

Ziawasch Abedjan, John Morcos, Michael Gubanov, Ihab F. Ilyas, Michael Stonebraker, Paolo Papotti, Mourad Ouzzani

Research output: Contribution to conferencePaper

9 Citations (Scopus)

Abstract

Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely unresolved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80% of transformation queries formulated by real-world users.

Original languageEnglish
Publication statusPublished - 1 Jan 2015
Event7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States
Duration: 4 Jan 20157 Jan 2015

Conference

Conference7th Biennial Conference on Innovative Data Systems Research, CIDR 2015
CountryUnited States
CityAsilomar
Period4/1/157/1/15

Fingerprint

Semantics
Data integration
Engines
World Wide Web
Experiments

ASJC Scopus subject areas

  • Information Systems and Management
  • Hardware and Architecture
  • Artificial Intelligence
  • Information Systems

Cite this

Abedjan, Z., Morcos, J., Gubanov, M., Ilyas, I. F., Stonebraker, M., Papotti, P., & Ouzzani, M. (2015). DataXFormer: Leveraging the web for semantic transformations. Paper presented at 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, United States.

DataXFormer : Leveraging the web for semantic transformations. / Abedjan, Ziawasch; Morcos, John; Gubanov, Michael; Ilyas, Ihab F.; Stonebraker, Michael; Papotti, Paolo; Ouzzani, Mourad.

2015. Paper presented at 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, United States.

Research output: Contribution to conferencePaper

Abedjan, Z, Morcos, J, Gubanov, M, Ilyas, IF, Stonebraker, M, Papotti, P & Ouzzani, M 2015, 'DataXFormer: Leveraging the web for semantic transformations' Paper presented at 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, United States, 4/1/15 - 7/1/15, .
Abedjan Z, Morcos J, Gubanov M, Ilyas IF, Stonebraker M, Papotti P et al. DataXFormer: Leveraging the web for semantic transformations. 2015. Paper presented at 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, United States.
Abedjan, Ziawasch ; Morcos, John ; Gubanov, Michael ; Ilyas, Ihab F. ; Stonebraker, Michael ; Papotti, Paolo ; Ouzzani, Mourad. / DataXFormer : Leveraging the web for semantic transformations. Paper presented at 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, United States.
@conference{e95f0c6e6c62451bba66eed155e0acea,
title = "DataXFormer: Leveraging the web for semantic transformations",
abstract = "Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely unresolved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80{\%} of transformation queries formulated by real-world users.",
author = "Ziawasch Abedjan and John Morcos and Michael Gubanov and Ilyas, {Ihab F.} and Michael Stonebraker and Paolo Papotti and Mourad Ouzzani",
year = "2015",
month = "1",
day = "1",
language = "English",
note = "7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 ; Conference date: 04-01-2015 Through 07-01-2015",

}

TY - CONF

T1 - DataXFormer

T2 - Leveraging the web for semantic transformations

AU - Abedjan, Ziawasch

AU - Morcos, John

AU - Gubanov, Michael

AU - Ilyas, Ihab F.

AU - Stonebraker, Michael

AU - Papotti, Paolo

AU - Ouzzani, Mourad

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely unresolved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80% of transformation queries formulated by real-world users.

AB - Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely unresolved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80% of transformation queries formulated by real-world users.

UR - http://www.scopus.com/inward/record.url?scp=84979529167&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979529167&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84979529167

ER -