|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tokenizer.NGramTokenizerFactory
public class NGramTokenizerFactory
An NGramTokenizerFactory creates n-gram tokenizers
of a specified minimum and maximun length.
An NGramTokenizer is a tokenizer that returns the
character n-grams from a specified sequence between a minimum
and maximum length. Whitespace takes the default behavior from
Tokenizer.nextWhitespace(), returning a string consisting of
a single space character.
For example, the result of
new NGramTokenizer("abcd".toCharArray(),0,4,2,3).tokenize()
is the string array:
{ "ab", "bc", "cd", "abc", "bcd" }
N-gram tokenizers are serializable and compilable. Both operations write the n-gram bounds to the output stream and read back in an instance of this class with those bounds.
| Constructor Summary | |
|---|---|
NGramTokenizerFactory(int minNGram,
int maxNGram)
Create an n-gram tokenizer factory with the specified minimum and maximum n-gram lengths. |
|
| Method Summary | |
|---|---|
void |
compileTo(ObjectOutput objOut)
Compiles this n-gram tokenizer factory to the specified object output stream. |
Tokenizer |
tokenizer(char[] cs,
int start,
int length)
Returns an n-gram tokenizer for the specified characters with the minimum and maximum n-gram lengths as specified in the constructor. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public NGramTokenizerFactory(int minNGram,
int maxNGram)
minNGram - Minimum n-gram length.maxNGram - Maximum n-gram length.
IllegalArgumentException - If the minimum is greater than
the maximum or if the maximum is less than one.| Method Detail |
|---|
public void compileTo(ObjectOutput objOut)
throws IOException
compileTo in interface CompilableobjOut - Output stream to which to write the tokenizer
factory.
IOException - If there is an exception writing the
parameters.
public Tokenizer tokenizer(char[] cs,
int start,
int length)
tokenizer in interface TokenizerFactorycs - Underlying character array.start - Index of first character in array to tokenize.length - Number of characters to tokenize.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||