public class EnglishStopTokenizerFactory extends StopTokenizerFactory implements Serializable
EnglishStopTokenizerFactoryapplies an English stop list to a contained base tokenizer factory.
The built-in stoplist consists of the following words:
a, be, had, it, only, she, was, about, because, has, its, of, some, we, after, been, have, last, on, such, were, all, but, he, more, one, than, when, also, by, her, most, or, that, which, an, can, his, mr, other, the, who, any, co, if, mrs, out, their, will, and, corp, in, ms, over, there, with, are, could, inc, mz, s, they, would, as, for, into, no, so, this, up, at, from, is, not, says, toNote that the stoplist entries are all lowercase. Thus the input should probably first be filtered by a
EnglishStopTokenizerFactory is serializable if its
base tokenizer factory is serializable.
|Constructor and Description|
Construct an English stop tokenizer factory with the specified base factory.
|Modifier and Type||Method and Description|
public EnglishStopTokenizerFactory(TokenizerFactory factory)
factory- Base tokenizer factory.