DataXFormer: Leveraging the web for semantic transformations

Ziawasch Abedjan, John Morcos, Michael Gubanov, Ihab F. Ilyas, Michael Stonebraker, Paolo Papotti, Mourad Ouzzani

Research output: Contribution to conferencePaper


Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely unresolved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80% of transformation queries formulated by real-world users.

Original languageEnglish
Publication statusPublished - 1 Jan 2015
Event7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States
Duration: 4 Jan 20157 Jan 2015


Conference7th Biennial Conference on Innovative Data Systems Research, CIDR 2015
CountryUnited States


ASJC Scopus subject areas

  • Information Systems and Management
  • Hardware and Architecture
  • Artificial Intelligence
  • Information Systems

Cite this

Abedjan, Z., Morcos, J., Gubanov, M., Ilyas, I. F., Stonebraker, M., Papotti, P., & Ouzzani, M. (2015). DataXFormer: Leveraging the web for semantic transformations. Paper presented at 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, United States.