The Complete search engine

Interactive, efficient, and towards ir&db integration

Holger Bast, Ingmar Weber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

51 Citations (Scopus)

Abstract

We describe CompleteSearch, an interactive search engine that offers the user a variety of complex features, which at first glance have little in common, yet are all provided via one and the same highly optimized core mechanism. This mechanism answers queries for what we call context-sensitive prefix search and completion: given a set of documents and a word range, compute all words from that range which are contained in one of the given documents, as well as those of the given documents which contain a word from the given range. Among the supported features are: (i) automatic query completion, for example, find all completions of the prefix "seman" that occur in the context of the word "ontology", as well as the best hits for any such completion; (ii) semi-structured (XML) retrieval, for example, find all emailmessages with "dbworld" in the subject line; (iii) semantic search, for example, find all politicians which had a private audience with the pope; (iv) DB-style joins and grouping, for example, find the most prolific authors with at least one paper in both "SIGMOD" and "SIGIR"; and (v) arbitrary combinations of these. The prefix search and completion mechanism of Complete- Search is realized via a novel kind of index data structure, which enables subsecond query processing times for collections up to a terabyte of data, on a single PC. We report on a number of lessons learned in the process of building the system and on our experience with a number of publicly used deployments.

Original languageEnglish
Title of host publicationCIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research
Pages88-95
Number of pages8
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event3rd Biennial Conference on Innovative Data Systems Research, CIDR 2007 - Asilomar, CA, United States
Duration: 7 Jan 200710 Jan 2007

Other

Other3rd Biennial Conference on Innovative Data Systems Research, CIDR 2007
CountryUnited States
CityAsilomar, CA
Period7/1/0710/1/07

Fingerprint

Query processing
Search engines
XML
Ontology
Data structures
Semantics

ASJC Scopus subject areas

  • Information Systems

Cite this

Bast, H., & Weber, I. (2007). The Complete search engine: Interactive, efficient, and towards ir&db integration. In CIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research (pp. 88-95)

The Complete search engine : Interactive, efficient, and towards ir&db integration. / Bast, Holger; Weber, Ingmar.

CIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research. 2007. p. 88-95.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bast, H & Weber, I 2007, The Complete search engine: Interactive, efficient, and towards ir&db integration. in CIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research. pp. 88-95, 3rd Biennial Conference on Innovative Data Systems Research, CIDR 2007, Asilomar, CA, United States, 7/1/07.
Bast H, Weber I. The Complete search engine: Interactive, efficient, and towards ir&db integration. In CIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research. 2007. p. 88-95
Bast, Holger ; Weber, Ingmar. / The Complete search engine : Interactive, efficient, and towards ir&db integration. CIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research. 2007. pp. 88-95
@inproceedings{e4880d67c9da475da0404815848256db,
title = "The Complete search engine: Interactive, efficient, and towards ir&db integration",
abstract = "We describe CompleteSearch, an interactive search engine that offers the user a variety of complex features, which at first glance have little in common, yet are all provided via one and the same highly optimized core mechanism. This mechanism answers queries for what we call context-sensitive prefix search and completion: given a set of documents and a word range, compute all words from that range which are contained in one of the given documents, as well as those of the given documents which contain a word from the given range. Among the supported features are: (i) automatic query completion, for example, find all completions of the prefix {"}seman{"} that occur in the context of the word {"}ontology{"}, as well as the best hits for any such completion; (ii) semi-structured (XML) retrieval, for example, find all emailmessages with {"}dbworld{"} in the subject line; (iii) semantic search, for example, find all politicians which had a private audience with the pope; (iv) DB-style joins and grouping, for example, find the most prolific authors with at least one paper in both {"}SIGMOD{"} and {"}SIGIR{"}; and (v) arbitrary combinations of these. The prefix search and completion mechanism of Complete- Search is realized via a novel kind of index data structure, which enables subsecond query processing times for collections up to a terabyte of data, on a single PC. We report on a number of lessons learned in the process of building the system and on our experience with a number of publicly used deployments.",
author = "Holger Bast and Ingmar Weber",
year = "2007",
month = "12",
day = "1",
language = "English",
pages = "88--95",
booktitle = "CIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research",

}

TY - GEN

T1 - The Complete search engine

T2 - Interactive, efficient, and towards ir&db integration

AU - Bast, Holger

AU - Weber, Ingmar

PY - 2007/12/1

Y1 - 2007/12/1

N2 - We describe CompleteSearch, an interactive search engine that offers the user a variety of complex features, which at first glance have little in common, yet are all provided via one and the same highly optimized core mechanism. This mechanism answers queries for what we call context-sensitive prefix search and completion: given a set of documents and a word range, compute all words from that range which are contained in one of the given documents, as well as those of the given documents which contain a word from the given range. Among the supported features are: (i) automatic query completion, for example, find all completions of the prefix "seman" that occur in the context of the word "ontology", as well as the best hits for any such completion; (ii) semi-structured (XML) retrieval, for example, find all emailmessages with "dbworld" in the subject line; (iii) semantic search, for example, find all politicians which had a private audience with the pope; (iv) DB-style joins and grouping, for example, find the most prolific authors with at least one paper in both "SIGMOD" and "SIGIR"; and (v) arbitrary combinations of these. The prefix search and completion mechanism of Complete- Search is realized via a novel kind of index data structure, which enables subsecond query processing times for collections up to a terabyte of data, on a single PC. We report on a number of lessons learned in the process of building the system and on our experience with a number of publicly used deployments.

AB - We describe CompleteSearch, an interactive search engine that offers the user a variety of complex features, which at first glance have little in common, yet are all provided via one and the same highly optimized core mechanism. This mechanism answers queries for what we call context-sensitive prefix search and completion: given a set of documents and a word range, compute all words from that range which are contained in one of the given documents, as well as those of the given documents which contain a word from the given range. Among the supported features are: (i) automatic query completion, for example, find all completions of the prefix "seman" that occur in the context of the word "ontology", as well as the best hits for any such completion; (ii) semi-structured (XML) retrieval, for example, find all emailmessages with "dbworld" in the subject line; (iii) semantic search, for example, find all politicians which had a private audience with the pope; (iv) DB-style joins and grouping, for example, find the most prolific authors with at least one paper in both "SIGMOD" and "SIGIR"; and (v) arbitrary combinations of these. The prefix search and completion mechanism of Complete- Search is realized via a novel kind of index data structure, which enables subsecond query processing times for collections up to a terabyte of data, on a single PC. We report on a number of lessons learned in the process of building the system and on our experience with a number of publicly used deployments.

UR - http://www.scopus.com/inward/record.url?scp=70849111170&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70849111170&partnerID=8YFLogxK

M3 - Conference contribution

SP - 88

EP - 95

BT - CIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research

ER -