E- Type of the tokens being tagged.
public class ClassifierTagger<E> extends Object implements Tagger<E>, Compilable, Serializable
ClassifierTaggerimplements the first-best tagger interface with a classifier that operates left-to-right over the tokens, classifying one token at a time.
The current state of the tagging up to the current
position being tagged is represented using the static
ClassifierTagger.State. The state
contains all of the input tokens, an integer input position,
and the tags for all of the tokens earlier in the sequence.
An advantage of the classifier tagger over a more complex tagger such as conditional random fields (CRF) is that it is able to use longer-distance information about tags that have already been assigned. Another advantage is that the classifier tagger will use much less memory and tag much more quickly. Depending on the base classifier used, a classifier tagger will likely be more efficient to train in terms of time and space than a CRF.
The implementation of the decoder is the obvious one. It walks along the input string, constructing a state of the position so far, then feeds the state into the classifier, the output classification of which determines the next state.
toClassifierCorpus()converts a tagging corpus to a classifier corpus, which may then be used to train a classifier. The resulting trained classifier may then be plugged into a classifier tagger, which may be serialized or compiled, depending on the serializability and compilability of the underlying classifier.
ClassifierTagger<E>with the deserialized classifier as its base classifier.
A classifier tagger is compilable if the underlying
classifier is compilable. The deserialized classifier tagger
will be an instance of
ClassifierTagger<E> with the
deserialized compiled classifier as its base classifier.
|Modifier and Type||Class and Description|
|Constructor and Description|
Construct a classifier tagger based on the specified base classifier over states.
|Modifier and Type||Method and Description|
Returns the underlying classifier for this classifier tagger.
Compile this classifier tagger to the specified object output stream.
Return the tagging for the specified list of tokens.
Return a corpus consisting of classified tagger states derived from the specified corpus of taggings.
public BaseClassifier<ClassifierTagger.State<E>> classifier()
public void compileTo(ObjectOutput out) throws IOException
public static <F> Corpus<ObjectHandler<Classified<ClassifierTagger.State<F>>>> toClassifiedCorpus(Corpus<ObjectHandler<Tagging<F>>> taggingCorpus)
The resulting corpus is implemented as a lightweight wrapper around the tagging corpus. This makes it slightly slower than explicitly converting the corpus, but is much smaller in memory.
The returned corpus will be serializable if the specified corpus is serializable.
F- Type of the tokens being tagged.
taggingCorpus- Corpus of taggings.