public class PorterStemmerTokenizerFactory extends ModifyTokenTokenizerFactory implements Serializable
PorterStemmerTokenizerFactory
applies Porter's stemmer
to the tokenizers produced by a base tokenizer factory.
Porter's stemmer computes an approximation of converting words
to their morphological base form. This class provides a single
top-level static method, stem(String)
, which returns a
stemmed form of an input string.
The underlying stemming code is Martin Porter's own public domain Java port of his original C implementation of stemming. More information can be found at:
Porter Stemmer Home Page
The original paper describing Porter's stemmer is:
Porter, Martin. 1980. An algorithm for suffix stripping. Program. 14:3. 130--137.
Constructor and Description |
---|
PorterStemmerTokenizerFactory(TokenizerFactory factory)
Construct a tokenizer factory that applies Porter stemming
to the tokenizers produced by the specified base factory.
|
Modifier and Type | Method and Description |
---|---|
String |
modifyToken(String token)
Returns the Porter stemmed version of the specified
token.
|
static String |
stem(String in)
Return the stem of the specified input string using the Porter
stemmer.
|
String |
toString() |
modify, modifyWhitespace
baseTokenizerFactory, tokenizer
public PorterStemmerTokenizerFactory(TokenizerFactory factory)
factory
- Base tokenizer factory.public String modifyToken(String token)
modifyToken
in class ModifyTokenTokenizerFactory
token
- Token to stem.public static String stem(String in)
in
- String to stem.public String toString()
toString
in class ModifyTokenTokenizerFactory