Manuscript:
Samuel R. Buss, Peter N. Yianilos.
"A Bipartite Matching Approach to Approximate String Comparison
and Search."
Unpublished manuscript.
Download article: postscript or PDF.
Abstract: Approximate string comparison and search
is an important part of applications that range from natural language to the
interpretation of DNA. This paper presents a bipartite weighted graph matching approach to
these problems, based on the authors' linear time matching algorithms Our approach's
tolerance to permutation of symbols or blocks, distinguishes it from the widely used edit
distance and finite state machine methods. A close relationship with the earlier related
`proximity comparison' method is established.
Under the linear cost model, a simple O(1) time per position online
algorithm is presented for comparing two strings given a fixed alignment. Heuristics are
given for optimal alignment. In the approximate string search problem, one string advances
in a fixed direction relative to the other with each time step. We introduce a new online
algorithm for this setting which dynamically maintains an optimal bipartite weighted
matching.
We discuss the application of our algorithms to natural language text
search, including prefilters to improve efficiency, and the use of polygraphic symbols to
improve search quality. Our approach is used in the LikeIt text
search utility now under development. Its overall design and objectives are summarized.
Back to Sam Buss's publications page.