public class AutoCompleter extends Object implements Serializable
AutoCompletermaintains a dictionary of phrases with counts and provides suggested completions based on prefix matching by weighted edit distance and phrase likelihood.
The maximum number of results cannot be changed dynamically,
because it is used during construction to set up all of the
compiled trie structures used for decoding. To change the
maximum number of results, simply construct a new auto completer
using the same phrase counts as edit distance as this completer;
these may be retrieved using the methods
For a given input, the
complete(String) method returns
a sorted set of scored objects, containing the most likely phrase
completions and their score, up to the maximum number of results
specified during construction.
Scores for the phrases are defined by the log (base 2) of their probability estimates:
where probabilities are estimated using maximum likelihood:score(phrase) = log2 p'(phrase)
p'(phrase) = count(phrase) / Σphrase' count(phrase')
Additive smoothing may be easily carried out on the inputs, so it is not carried out by this class.
The score for a prefix matching a phrase is given by:
In words, the score for a prefix matching a phrase is the sum of log probability of the phrase plus the edit distance between the prefix and the best matching prefix of the phrase. The edit distances should thus be scaled as log probabilities in order to combine with the phrase probabilities properly. See the class documentation forscore(prefix,phrase) = MAXphrase.startsWith(prefix') editDistance.distance(prefix,prefix') + log2 p'(phrase)
TrainSpellCheckerfor general advice on combining the tuning of edit distance with that of phrase probabilities.
AutoCompleteris completely thread safe. Setting the maximum search queue size is safe because integer writes are atomic, but it is not declared volate, and hence may not be visible to other threads without synchronization.
AutoCompletermay be serialized if and only if its weighted edit distance is serializable. If so, the result of serializing and reconstituting an auto-completer will produce an auto-completer with the same behavior as the one serialized, modulo the edit distance serialization, which is under control of the edit distance implementation.
|Constructor and Description|
Construct an automatic completer from the specified phrases, phrase counts, edit distance, and search parameters.
|Modifier and Type||Method and Description|
Returns a set of scored phrases sorted into decreasing order of score.
Returns the weighted edit distance for this auto-completer.
Returns the maximum number of results returned by this auto completer for each input prefix.
Returns the maximum number of elements on the search queue.
Returns the phrase counter for this auto completer.
Sets the maximum search queue size to the specified value.
public AutoCompleter(Map<String,? extends Number> phraseCounts, WeightedEditDistance editDistance, int maxResultsPerPrefix, int maxSearchQueueSize, double minScore)
phraseCounts- Map from phrases to counts.
editDistance- Distance used to compare mismatched suggestions.
maxResultsPerPrefix- The maximum number of results that can be returned.
maxSearchQueueSize- The beam size for searching for matches.
minScore- Minimum score for outcome to be retained in results.
IllegalArgumentException- If any of the counts is not finite or negative, or if the max results or max queue sizes are not positive, or if the minimum score is not finite and negative.
public int maxResultsPerPrefix()
public WeightedEditDistance editDistance()
Floatvalues calculated from compiled probability estimates. Changes to the returned count map will not affect this class.
public int maxSearchQueueSize()
public void setMaxSearchQueueSize(int size)
This operation is thread safe because integer sets are atomic. But changes may not be visible to other threads if not synchronized.
size- The new search queue size.
IllegalArgumentException- If the size is zero or negative.
public SortedSet<ScoredObject<String>> complete(String in)
To print out all the matches in descending order of scores, use:
for (ScoredObject<String> so : complete(String)) println("phrase=" + so.getObject() + " score=" + so.score());
in- The string to complete.