|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||
See:
Description
| Interface Summary | |
|---|---|
| TokenCategorizer | A TokenCategorizer supplies a string-based
category for string-based tokens. |
| TokenizerFactory | A TokenizerFactory constructors tokenizers from
subsequences of character arrays. |
| Class Summary | |
|---|---|
| CharacterTokenCategorizer | Returns a category for tokens made up out of a single character. |
| CharacterTokenizerFactory | A CharacterTokenizerFactory considers each
non-whitespace character in the input to be a distinct token. |
| EnglishStopListFilterTokenizer | An EnglishStopListFilterTokenizer filters its input by
removing words on the English stop list. |
| FilterTokenizer | A FilterTokenizer contains a tokenizer to
which it delegates the tokenizer methods. |
| IndoEuropeanTokenCategorizer | A IndoEuropeanTokenCategorizer is a generic token
categorizer for Indo-European languages that is based on character
"shape". |
| IndoEuropeanTokenizerFactory | An IndoEuropeanTokenizerFactory creates tokenizers for
subsequences of character arrays. |
| LengthStopFilterTokenizer | A StopFilterTokenizer removes tokens that exceed a
specified length. |
| LineTokenizerFactory | A LineTokenizerFactory treats each line of an input as
a token. |
| LowerCaseFilterTokenizer | A LowerCaseFilterTokenizer renders all of its
tokens in lower case as defined by String.toLowerCase(). |
| NGramTokenizerFactory | An NGramTokenizerFactory creates n-gram tokenizers
of a specified minimum and maximun length. |
| NormalizeWhiteSpaceFilterTokenizer | A NormalizeWhiteSpaceFilterTokenizer reduces each
non-empty whitespace to a single space. |
| PorterStemmer | The PorterStemmer class is Martin Porter's Java
implementation of his English stemmer. |
| PorterStemmerFilterTokenizer | A PorterStemmerFilterTokenizer returns the stemmed
version of each token, as produced by PorterStemmer.stem(String). |
| PunctuationStopListTokenizer | A PunctuationStopListTokenizer removes tokens consisting
entirely of punctuation. |
| RegExTokenizerFactory | A RegExTokenizerFactory creates a tokenizer factory
out of a regular expression. |
| SoundexFilterTokenizer | The SoundexFilterTokenizer replaces each token with
its Soundex encoding. |
| StopFilterTokenizer | A StopFilterTokenizer removes tokens from the token
stream if they meet conditions specified by concrete subclasses. |
| StopListFilterTokenizer | A StopListFilterTokenizer is a stop-list-based stop
filter tokenizer that removes tokens from a tokenizer stream if
they are on a specified list of so-called ``stop'' tokens. |
| TokenFeatureExtractor | A TokenFeatureExtractor produces feature vectors from
character sequences representing token counts. |
| TokenFilterTokenizer | A TokenFilterTokenizer allows a sequence of tokens to
be filtered a token at a time. |
| Tokenizer | Abstract base class for tokenizers. |
Classes for tokenizing character sequences.
|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||