On the Complexity of Reverse Similarity Search

26 April 2008 - updated 12 May 2008
[Site traffic Strip-O-Meter]

Click to censor the Strip-O-Meter.

This page gathers together resources and links on my paper On the Complexity of Reverse Similarity Search, which appeared in SISAP '08. First, the abstract:

Two decision problems are presented that arise from reversing the operation of a distance-based indexing tree. Whereas similarity search finds points in the tree given a query point, reverse similarity search begins with a set of constraints like those defining a leaf, and generates a point meeting the constraints. These problems derive from robust hashing, a technique used in similarity search and security applications.

The problems are analysed for spaces of strings and vectors with a variety of metrics: strings with Hamming distance; the usual (Levenshtein) edit distance; an edit distance we introduce called Superghost distance; arbitrary weighted tree metrics; and real vectors with Minkowski $L_p$ metrics (of which the Euclidean distance is a special case). They are found to inhabit different complexity classes depending on the metric. In particular, the reverse similarity search problem derived from a VP- or GH-tree is NP-complete for any $L_p$ metric except that it is in P for a GH-tree with the Euclidean metric.

The actual paper is available through the publishers, but requires a payment. If you're located at a university you may be covered by a site license allowing you to download it at no further charge. Here are the slides from my talk; and here's my BIBTeX database of references (not all of which are necessarily used in the actual paper).

Comments

No comments yet.

New comments are disabled, pending transition to new site code.
Copyright 2008 Matthew Skala
Updates to this site: [RSS syndication file]