Trading Quality for Time with Nearest-Neighbor Search.
|Title||Trading Quality for Time with Nearest-Neighbor Search.|
|Author(s)||R. Weber, K. Böhm|
|Booktitle||Proc. of the 7th Conf. on Extending Database Technology (EDBT 2000)Konstanz, Germany|
|Organization||Institute of Information Systems, ETH Zürich|
AbstractIn many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In particular, this holds true for Nearest-Neighbor Search (NN-Search), a typical implementation of similarity search. In this article, we investigate approximate NN-query evaluation techniques based on the VA-File. This data structure efficiently supports NN-query evaluation in high dimensions. The VA-File contains approximations of each point. VA-File based NN-query evaluation computes bounds on the distance between each point and the query to filter out the vast majority of points. Then, a second phase identifies the NN by computing exact distances of all remaining points. To develop approximate query-evaluation techniques, we proceed in two steps: first, we derive an analytic model for VA-File based NN-search. This is to investigate the relationship between approximation granularity, effectiveness of the filtering step and search performance. In more detail, we develop formulae for the distribution of the error of the bounds and the duration of the different phases of query evaluation. Based on these results, we develop different approximate query evaluation techniques. The first one adapts the bounds to have a more rigid filtering, the second one skips computation of the exact distances. Experiments show that these techniques have the desired effect: for instance, when allowing for a small but specific reduction of result quality, we observed a speedup of 7 in 50-NN search.