com.aliasi.spell

## Class JaccardDistance

• All Implemented Interfaces:
Distance<CharSequence>, Proximity<CharSequence>

```public class JaccardDistance
extends TokenizedDistance```
The `JaccardDistance` class implements a notion of distance based on token overlap. The tokens are generated from the character sequences being compared by a tokenizer factory that is supplied at construction time. A distance of zero (`0`) is a perfect match, a distance of one (`1`0 a perfect mismatch.

Suppose `termSet(cs)` is the set of tokens extracted from the character sequence `cs`. With these terms, the proximity underlying Jaccard distance is defined as the percentage of tokens that appear in both character sequences:

``` proximity(cs1,cs2)
= size(termSet(cs1) INTERSECT termSet(cs2))
/ size(termSet(cs1) UNION termSet(cs2))```
Proximities run between 0 and 1. A proximity of 0 means the character sequences share no terms in common and a proximity of 1 means the character sequences share all of their terms.

Distance is then defined in terms of proximity by subtraction.

``` distance(cs1,cs2) = 1 - proximity(cs1,cs2)
```
Distances also run between 0 and 1. A distance of 0 means the character sequences share all of their terms, whereas a distance of 1 means they have no terms in common.
Since:
LingPipe2.4
Version:
3.8
Author:
Bob Carpenter
• ### Constructor Summary

Constructors
Constructor and Description
`JaccardDistance(TokenizerFactory factory)`
Construct an instance of Jaccard string distance using the specified tokenizer factory.
• ### Method Summary

All Methods
Modifier and Type Method and Description
`double` ```distance(CharSequence cSeq1, CharSequence cSeq2)```
Returns the Jaccard distance between the specified character sequence.
`double` ```proximity(CharSequence cSeq1, CharSequence cSeq2)```
Returns the proximity between the specified character sequences.
• ### Methods inherited from class com.aliasi.spell.TokenizedDistance

`termFrequencyVector, tokenizerFactory, tokenSet, tokenSet`
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Constructor Detail

• #### JaccardDistance

`public JaccardDistance(TokenizerFactory factory)`
Construct an instance of Jaccard string distance using the specified tokenizer factory.
Parameters:
`factory` - Tokenizer factory for distance.
• ### Method Detail

• #### distance

```public double distance(CharSequence cSeq1,
CharSequence cSeq2)```
Returns the Jaccard distance between the specified character sequence. See the class definition above for a definition.
Parameters:
`cSeq1` - First character sequence.
`cSeq2` - Second character sequence.
Returns:
Jaccard distance between the sequences.
• #### proximity

```public double proximity(CharSequence cSeq1,
CharSequence cSeq2)```
Returns the proximity between the specified character sequences.
Parameters:
`cSeq1` - First character sequence.
`cSeq2` - Second character sequence.
Returns:
Jaccard proximity between the sequences.