Schema-as-you-go

On probabilistic tagging and querying of wide tables

Meiyu Lu, Divyakant Agrawal, Bing Tian Dai, Anthony K H Tung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

The emergence of Web 2.0 has resulted in a huge amount of heterogeneous data that are contributed by a large number of users, engendering new challenges for data management and query processing. Given that the data are unified from various sources and accessed by numerous users, providing users with a unified mediated schema as data integration is insufficient. On one hand, a deterministic mediated schema restricts users' freedom to express queries in their preferred vocabulary; on the other hand, it is not realistic for users to remember the numerous attribute names that arise from integrating various data sources. As such, a user-oriented data management and query interface is required. In this paper, we propose an out-of-the-box approach that separates users' actions from database operations. This separating layer deals with the challenges from a semantic perspective. It interprets the semantics of each data value through tags that are provided by users, and then inserts the value into the database together with these tags. When querying the database, this layer also serves as a platform for retrieving data by interpreting the semantics of the queried tags from the users. Experiments are conducted to illustrate both the effectiveness and efficiency of our approach.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
Pages181-192
Number of pages12
DOIs
Publication statusPublished - 11 Jul 2011
Externally publishedYes
Event2011 ACM SIGMOD and 30th PODS 2011 Conference - Athens, Greece
Duration: 12 Jun 201116 Jun 2011

Other

Other2011 ACM SIGMOD and 30th PODS 2011 Conference
CountryGreece
CityAthens
Period12/6/1116/6/11

Fingerprint

Semantics
Information management
Data integration
Query processing
Experiments

Keywords

  • dynamic instantiation
  • probabilistic tagging
  • wide table

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Lu, M., Agrawal, D., Dai, B. T., & Tung, A. K. H. (2011). Schema-as-you-go: On probabilistic tagging and querying of wide tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 181-192) https://doi.org/10.1145/1989323.1989343

Schema-as-you-go : On probabilistic tagging and querying of wide tables. / Lu, Meiyu; Agrawal, Divyakant; Dai, Bing Tian; Tung, Anthony K H.

Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011. p. 181-192.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lu, M, Agrawal, D, Dai, BT & Tung, AKH 2011, Schema-as-you-go: On probabilistic tagging and querying of wide tables. in Proceedings of the ACM SIGMOD International Conference on Management of Data. pp. 181-192, 2011 ACM SIGMOD and 30th PODS 2011 Conference, Athens, Greece, 12/6/11. https://doi.org/10.1145/1989323.1989343
Lu M, Agrawal D, Dai BT, Tung AKH. Schema-as-you-go: On probabilistic tagging and querying of wide tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011. p. 181-192 https://doi.org/10.1145/1989323.1989343
Lu, Meiyu ; Agrawal, Divyakant ; Dai, Bing Tian ; Tung, Anthony K H. / Schema-as-you-go : On probabilistic tagging and querying of wide tables. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011. pp. 181-192
@inproceedings{407e3cea4b1440829c3b069c98aa84da,
title = "Schema-as-you-go: On probabilistic tagging and querying of wide tables",
abstract = "The emergence of Web 2.0 has resulted in a huge amount of heterogeneous data that are contributed by a large number of users, engendering new challenges for data management and query processing. Given that the data are unified from various sources and accessed by numerous users, providing users with a unified mediated schema as data integration is insufficient. On one hand, a deterministic mediated schema restricts users' freedom to express queries in their preferred vocabulary; on the other hand, it is not realistic for users to remember the numerous attribute names that arise from integrating various data sources. As such, a user-oriented data management and query interface is required. In this paper, we propose an out-of-the-box approach that separates users' actions from database operations. This separating layer deals with the challenges from a semantic perspective. It interprets the semantics of each data value through tags that are provided by users, and then inserts the value into the database together with these tags. When querying the database, this layer also serves as a platform for retrieving data by interpreting the semantics of the queried tags from the users. Experiments are conducted to illustrate both the effectiveness and efficiency of our approach.",
keywords = "dynamic instantiation, probabilistic tagging, wide table",
author = "Meiyu Lu and Divyakant Agrawal and Dai, {Bing Tian} and Tung, {Anthony K H}",
year = "2011",
month = "7",
day = "11",
doi = "10.1145/1989323.1989343",
language = "English",
isbn = "9781450306614",
pages = "181--192",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - Schema-as-you-go

T2 - On probabilistic tagging and querying of wide tables

AU - Lu, Meiyu

AU - Agrawal, Divyakant

AU - Dai, Bing Tian

AU - Tung, Anthony K H

PY - 2011/7/11

Y1 - 2011/7/11

N2 - The emergence of Web 2.0 has resulted in a huge amount of heterogeneous data that are contributed by a large number of users, engendering new challenges for data management and query processing. Given that the data are unified from various sources and accessed by numerous users, providing users with a unified mediated schema as data integration is insufficient. On one hand, a deterministic mediated schema restricts users' freedom to express queries in their preferred vocabulary; on the other hand, it is not realistic for users to remember the numerous attribute names that arise from integrating various data sources. As such, a user-oriented data management and query interface is required. In this paper, we propose an out-of-the-box approach that separates users' actions from database operations. This separating layer deals with the challenges from a semantic perspective. It interprets the semantics of each data value through tags that are provided by users, and then inserts the value into the database together with these tags. When querying the database, this layer also serves as a platform for retrieving data by interpreting the semantics of the queried tags from the users. Experiments are conducted to illustrate both the effectiveness and efficiency of our approach.

AB - The emergence of Web 2.0 has resulted in a huge amount of heterogeneous data that are contributed by a large number of users, engendering new challenges for data management and query processing. Given that the data are unified from various sources and accessed by numerous users, providing users with a unified mediated schema as data integration is insufficient. On one hand, a deterministic mediated schema restricts users' freedom to express queries in their preferred vocabulary; on the other hand, it is not realistic for users to remember the numerous attribute names that arise from integrating various data sources. As such, a user-oriented data management and query interface is required. In this paper, we propose an out-of-the-box approach that separates users' actions from database operations. This separating layer deals with the challenges from a semantic perspective. It interprets the semantics of each data value through tags that are provided by users, and then inserts the value into the database together with these tags. When querying the database, this layer also serves as a platform for retrieving data by interpreting the semantics of the queried tags from the users. Experiments are conducted to illustrate both the effectiveness and efficiency of our approach.

KW - dynamic instantiation

KW - probabilistic tagging

KW - wide table

UR - http://www.scopus.com/inward/record.url?scp=79959929548&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959929548&partnerID=8YFLogxK

U2 - 10.1145/1989323.1989343

DO - 10.1145/1989323.1989343

M3 - Conference contribution

SN - 9781450306614

SP - 181

EP - 192

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

ER -