High dimensional nearest neighbor searching

Hakan Ferhatosmanoglu, Ertem Tuncel, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

22 Citations (Scopus)


As databases increasingly integrate different types of information such as time-series, multimedia and scientific data, it becomes necessary to support efficient retrieval of multi-dimensional data. Both the dimensionality and the amount of data that needs to be processed are increasing rapidly. As a result of the scale and high dimensional nature, the traditional techniques have proven inadequate. In this paper, we propose search techniques that are effective especially for large high dimensional data sets. We first propose VA+-file technique which is based on scalar quantization of the data. VA+-file is especially useful for searching exact nearest neighbors (NN) in non-uniform high dimensional data sets. We then discuss how to improve the search and make it progressive by allowing some approximations in the query result. We develop a general framework for approximate NN queries, discuss various approaches for progressive processing of similarity queries, and develop a metric for evaluation of such techniques. Finally, a new technique based on clustering is proposed, which merges the benefits of various approaches for progressive similarity searching. Extensive experimental evaluation is performed on several real-life data sets. The evaluation establishes the superiority of the proposed techniques over the existing techniques for high dimensional similarity searching. The techniques proposed in this paper are effective for real-life data sets, which are typically non-uniform, and they are scalable with respect to both dimensionality and size of the data set.

Original languageEnglish
Pages (from-to)512-540
Number of pages29
JournalInformation Systems
Issue number6
Publication statusPublished - 1 Sep 2006
Externally publishedYes



  • Approximate and progressive search
  • High dimensional data
  • Indexing
  • Nearest neighbor queries
  • Non-uniform data
  • Performance
  • Scalability
  • Similarity search

ASJC Scopus subject areas

  • Management Information Systems
  • Management of Technology and Innovation
  • Hardware and Architecture
  • Information Systems
  • Software

Cite this

Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., & Abbadi, A. E. (2006). High dimensional nearest neighbor searching. Information Systems, 31(6), 512-540. https://doi.org/10.1016/j.is.2005.01.001