public class EditDistance extends Object implements Distance<CharSequence>, Proximity<CharSequence>
EditDistanceclass implements the standard notion of edit distance, with or without transposition. The distance without transposition is known as Levenshtein distance, and with transposition as Damerau-Levenstein distance (see below about distance-like metric properties).
The edit distance between two strings is defined to be the minimum number of non-matching substring edits that is required to turn one string into the other. The available edits and their corresponding input and output substrings are:
Examples of minimal edit sequences are as follows, with distances as indicated
Operation Input Output Notation Match "a" "a" m(a) Insert "" "a" i(a) Delete "a" "" d(a) Substitute "a" "b" s(b,a) Transpose "ab" "ba" t(ab)
Note that, in general, there will be more than one way to edit a string into another string. For instance, a delete and insert may replace a transposition, so that "hte" becomes "the" through edits "d(h) m(t) i(h) m(e)" or "i(t) m(h) d(t) m(e)", as well as many many more, such as "d(h) d(t) d(e) i(t) i(h) i(e)".
Input Output Edit Sequence (w. Transp) Dist (w. Transp) it it m(i) m(t) 0 gage gauge m(g) m(a) i(u) m(g) m(e) 1 thhe the m(t) d(h) m(h) m(e) 1 tenser tensor m(t) m(e) m(n) m(s) s(o,e) m(r) 1 hte the d(h) m(t) i(h) m(e) [t(ht) m(e)] 2  htne then d(h) m(t) s(h,n) m(e) i(n) [t(ht) t(ne)] 3  pwn4g ownage s(o,p) m(w) m(n) s(a,4) m(g) i(e) 3
Edit distance without transposition defines a proper metric.
Recall that a distance measure
d(x,y) forms a metric
if for all
x, y, z, we have (1)
d(x,y) = d(y,x), (3)
0, and (4)
d(x,y) + d(y,z) >= d(x,z). All
of these properties are easy to verify. But with transposition, we
have strings such as
ACB. With transposition,
d(AB,BCA)= 3 >= d(AB,BA) +
d(BA,BCA) = 1 + 1 = 2.
Implementation Note: This class implements edit distance
using dynamic programming in time
m are the length of the sequences
being compared. Using a sliding window of three lattice slices rather
than allocating the entire lattice at once, the space required is
that for three arrays of integers as long as the shorter of the two
character sequences being compared. For details, see section
|Modifier and Type||Field and Description|
Edit distance disallowing transposition.
Edit distance allowing transposition.
|Constructor and Description|
Construct an edit distance with or without transposition based on the specified flag.
|Modifier and Type||Method and Description|
Returns the edit distance between the specified character sequences.
Returns the edit distance between the character sequences with or without transpositions as specified.
Returns the proximity between the character sequences.
Returns a string representation of this edit distance.
public static final Distance<CharSequence> TRANSPOSING
public EditDistance(boolean allowTransposition)
allowTransposition- Set to
trueto allow transposition edits in the constructed distance.
public double distance(CharSequence cSeq1, CharSequence cSeq2)
public double proximity(CharSequence cSeq1, CharSequence cSeq2)
and thus proximities will all be negative or zero.proximity(cs1,cs2) = -distance(cs1,cs2)
public String toString()
public static int editDistance(CharSequence cSeq1, CharSequence cSeq2, boolean allowTransposition)
cSeq1- First character sequence.
cSeq2- Second character sequence.
allowTransposition- Set to
trueto allow transposition edits.