This page gathers together resources and links on my paper On the Complexity of Reverse Similarity Search, which appeared in SISAP '08. First, the abstract:
Two decision problems are presented that arise from reversing the operation of a distance-based indexing tree. Whereas similarity search finds points in the tree given a query point, reverse similarity search begins with a set of constraints like those defining a leaf, and generates a point meeting the constraints. These problems derive from robust hashing, a technique used in similarity search and security applications.
The problems are analysed for spaces of strings and vectors with a variety of metrics: strings with Hamming distance; the usual (Levenshtein) edit distance; an edit distance we introduce called Superghost distance; arbitrary weighted tree metrics; and real vectors with Minkowski $L_p$ metrics (of which the Euclidean distance is a special case). They are found to inhabit different complexity classes depending on the metric. In particular, the reverse similarity search problem derived from a VP- or GH-tree is NP-complete for any $L_p$ metric except that it is in P for a GH-tree with the Euclidean metric.
The actual paper is available through the publishers, but requires a payment. If you're located at a university you may be covered by a site license allowing you to download it at no further charge. Here are the slides from my talk; and here's my BIBTeX database of references (not all of which are necessarily used in the actual paper).