Query relaxation by structure and semantics for retrieval of logical Web documents

Wen Syan Li, K. Selçuk Candan, Quoc Vu, Divyakant Agrawal

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as, 1) all information in one physical page, or 2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages containing keywords. In this paper, we introduce the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data.

Original languageEnglish
Pages (from-to)768-791
Number of pages24
JournalIEEE Transactions on Knowledge and Data Engineering
Volume14
Issue number4
DOIs
Publication statusPublished - 1 Jul 2002
Externally publishedYes

Fingerprint

XML
World Wide Web
Semantics
HTML
Query processing
Search engines
Information retrieval

Keywords

  • Link structures
  • Progressive processing
  • Query relaxation
  • Web proximity search

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Query relaxation by structure and semantics for retrieval of logical Web documents. / Li, Wen Syan; Candan, K. Selçuk; Vu, Quoc; Agrawal, Divyakant.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 4, 01.07.2002, p. 768-791.

Research output: Contribution to journalArticle

Li, Wen Syan ; Candan, K. Selçuk ; Vu, Quoc ; Agrawal, Divyakant. / Query relaxation by structure and semantics for retrieval of logical Web documents. In: IEEE Transactions on Knowledge and Data Engineering. 2002 ; Vol. 14, No. 4. pp. 768-791.
@article{81333f0a5c2740a38c49c4265273cdb0,
title = "Query relaxation by structure and semantics for retrieval of logical Web documents",
abstract = "Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as, 1) all information in one physical page, or 2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages containing keywords. In this paper, we introduce the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data.",
keywords = "Link structures, Progressive processing, Query relaxation, Web proximity search",
author = "Li, {Wen Syan} and Candan, {K. Sel{\cc}uk} and Quoc Vu and Divyakant Agrawal",
year = "2002",
month = "7",
day = "1",
doi = "10.1109/TKDE.2002.1019213",
language = "English",
volume = "14",
pages = "768--791",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Query relaxation by structure and semantics for retrieval of logical Web documents

AU - Li, Wen Syan

AU - Candan, K. Selçuk

AU - Vu, Quoc

AU - Agrawal, Divyakant

PY - 2002/7/1

Y1 - 2002/7/1

N2 - Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as, 1) all information in one physical page, or 2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages containing keywords. In this paper, we introduce the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data.

AB - Since the Web encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as, 1) all information in one physical page, or 2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages containing keywords. In this paper, we introduce the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to efficiently retrieve information units. Our algorithm can perform progressive query processing. These functionalities are essential for information retrieval on the Web and large XML databases. We also present experimental results on synthetic graphs and real Web data.

KW - Link structures

KW - Progressive processing

KW - Query relaxation

KW - Web proximity search

UR - http://www.scopus.com/inward/record.url?scp=0036648582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036648582&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2002.1019213

DO - 10.1109/TKDE.2002.1019213

M3 - Article

AN - SCOPUS:0036648582

VL - 14

SP - 768

EP - 791

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 4

ER -