E
 Type of tokens in the tagging.public class ChainCrf<E> extends Object implements Tagger<E>, NBestTagger<E>, MarginalTagger<E>, Serializable
ChainCrf<E>
class implements linear chain conditional
random field decoding and estimation for input token sequences of
type E
.
The cliques of the graphical model are typically called nodes and edges in a chain CRF. A node consists of a token position, the rest of the input tokens, whereas an edge adds the previous tag. A CRF feature extractor converts nodes and edges to feature maps, which are then converted to vectors for use in CRFs.
First, we suppose a fixed, finite set of K
tags tags[0],...,tags[K1]
which may be assigned to tokens as part of a
tagging. The coeffcient matrix β
consists of
K
coefficient vectors,
β[0],...,β[K1]
, one for each tag.
Second, we assume a pair of feature extractors, one for nodes in
the graphical model and one for edges. The node feature extractor
φ_{0}
maps sequences xs
of input
tokens and a position n
in that sequence (0 <= n <
xs.length
) into a feature map φ_{0}(xs,n)
.
The edge feature extractor φ_{1}
maps
input token sequences xs
, a position n
in that
sequence, and previous tag tags[k]
to a feature
map φ_{1}(xs,n,tags[k])
.
The combined features extracted for input token sequence
xs
, position n
in that sequence, and tag tags[k]
are:
defined by:
φ(xs,0,tags[k]) = φ_{0}(xs,0)
φ(xs,n,tags[k]) = φ_{0}(xs,n) + φ_{1}(xs,n,tags[k]) for n > 0
The special case for position 0 is because there is no previous
tag. Note that boundary features (beginofsequence, endofsequence)
may be defined in φ_{0}
because it knows
the position and length of input.
Tag Sequence Probability
The probability model assumes a fixed sequence of input tokens
xs[0],...,xs[N1]
of length N
. The model assigns
a conditional probability to sequences of tag indices
ys[0],...,ys[N1]
(where 0 <= ys[n] < tags.length
),
by:
p(ysws) ∝ exp(Σ_{n < N} φ(xs,n,tags[ys[n1]])⋅β[ys[n]])
where the sum is over the dot product of the feature vector
φ(xs,n,tags[ys[n1]])
extracted for the input
sequence of tokens xs
at position n
and
previous tag tags[ys[n1]]
with index ys[n1]
,
and the coefficient for output tag tags[ys[n]]
,
namely β[tags[ys[n]]
.
Note on Factored Coefficients
Note that this presentation of CRFs is a bit nonstandard. In
particular, LingPipe has factored what is usually a single large
coefficient vector into one vector per output tag. This somewhat
restricts the kinds of features that may be extracted so that each
feature depends on a specific output tag. The reason this is done
is for efficiency. Note that the node feature extractor
φ_{0}(xs,n)
could be rolled into the more
general edge feature extractor
φ_{1}(xs,n,t)
; the node feature extractor
is only for efficiency, allowing features shared for all previous
tags t
to be extracted only once.
Legal Tags: Structural Zeros
The more verbose constructor for chain CRFs takes three array
arguments which indicate which tags are legal in which positions.
The shorter convenience constructor implicitly assumes any tag may
occur anywhere. The legal start tag array indicates if a tag may
start a tagging. The legal end tag array indicates if a tag may
be the last tag in a tagging. The legal transition array indicates
for a pair of tags if one may follow the other.
These are typically used when chunkers are encoded as taggers,
and some sequences of tags are structurally illegal. These are
called structural zeros in statistical modeling terms.
During estimation, these may be computed by specifying
FirstBest Output: Viterbi
Firstbest output is calculated very efficiently in linear time
in the size of the inputs using the Viterbi algorithm, a form
of dynamic programming applicable to sequence tagging. Note that
the normalizing constant does not need to be computed. This
functionality is avaiable through the method tag(List)
of the Tagger
interface.
NBest Sequence Output: Viterbi Forward, A^{*} Backward
Nbest output may be calculated in one linear forward pass and an
efficient backward pass. The forward pass computes and stores the
full Viterbi lattice consisting of the best analysis and score up
to a given token for a given tag. The backward pass uses the
A^{*} algorithm with the Viterbi scores as exact completion
estimates. This allows nbest analyses to be returned in
order, with at most a constant amount of work per result.
Nbest functionality is available through the method
tagNBest(List,int)
of the NBestTagger
interface. This returns an iterator over scored tagging
instances. The scores are the unnormalized products defined
above.
Nbest functionality with scores normalized to conditional
probabilities may be computed using tagNBestConditional(List,int)
. This requires computing the
normalizing constant using the forwardbackward algorithm described
in the next section.
Marginal Tag Probabiilty Output: ForwardBackward Algorithm
The forwardbackward algorithm may be applied to conditional random
fields. The generalized undirected graphical model form of this
algorithm is known as the productsum algorithm. The result is
returned as an instance of TagLattice
.
The forwardbackward algorithm efficiently computes unnormalized
forward, backward, and normalizing probabilities. As defined
in the TagLattice
class documentation, these
may be used to define the probability of a tag or tag sequence
staring at a specified position given the input tokens.
The unnormalized total probability Z may be used with
nbest output to convert nbest sequence scores into conditional
probabilities of the tagging given the input tokens.
Stochastic Gradient Descent Training
Training is carried out for conditional random fields the same
way as for logistic regression classifiers (see LogisticRegressionClassifier
and LogisticRegression
. It's mathematically
identical to think of CRFs as multiway classifiers where the
categories are sequences of tags. Under that analogy, training for
conditional random fields exactly mirrors that for logistic
regression. The only difference for conditional random fields is
that the expectations at each step are computed using the
forwardbackward algorithm and efficiently applied over the
lattice.
See the logistic regression classes for a description of the
prior (also see RegressionPrior
), and the learning and
annealing rate (also see AnnealingSchedule
).
Reporting is exactly the same for conditional random fields as
for logistic regression, with each epoch reporting learning rate
(lr), log likelihood (ll), log prior probability (lp), and the
combined objective being optimized consisting of the sum of log
likelihood and log prior probability (llp), and the best combined
objective so far (llp*).
Unlike in logistic regression, the prior is applied according
to a blocking strategy rather than through lazy updates. The block
size is a parameter that determines how often the prior gradient
is applied. In effect, it treats updates as blocks. The weight of
the prior is adjusted accordingly. After all of the instances in
an epoch, the prior is applied one more time to ensure that the
coefficients are in the right state after each epoch.
Significant speedups can be achieved by setting the feature
caching flag in the estimator method to true
. This may
require a large amount of memory, because it stores the entire
lattice of vectors for each training instance.
Serialization
A chain CRF may be serialized if its parts are serializable: the
node and edge feature extractor, the feature symbol table, and the
vector coefficients.
The CRF read back in will be identical to the one that was
serialized to the extent that the parts are equivalent: feature
extractors, symbol table, and coefficients.
For CRFs estimated using the gradient descent estimator method,
the symbol table and vector coefficients are serializable in such a
way as to reproduce equivalent objects to the ones being
serialized.
Thread Safety
A CRF is thread safe to the extent that its symbol table and
feature extractors are thread safe. Calling the tagging methods
concurrently will result in concurrent calls to feature extractors
and symbol tables. The standard symbol table implementations are
all thread safe, and almost all wellimplemented feature extractors
will be thread safe.
 Since:
 LingPipe3.9
 Version:
 3.9.2
 Author:
 Bob Carpenter
 See Also:
 Serialized Form


Constructor Summary
Constructors
Constructor and Description
ChainCrf(String[] tags,
boolean[] legalTagStarts,
boolean[] legalTagEnds,
boolean[][] legalTagTransitions,
Vector[] coefficients,
SymbolTable featureSymbolTable,
ChainCrfFeatureExtractor<E> featureExtractor,
boolean addInterceptFeature)
Construct a conditional random field from the specified tags,
feature vector coefficients, symbol table for feature, feature
extractors and flag indicating whether to add intercepts or
not.
ChainCrf(String[] tags,
Vector[] coefficients,
SymbolTable featureSymbolTable,
ChainCrfFeatureExtractor<E> featureExtractor,
boolean addInterceptFeature)
Construct a conditional random field from the specified tags,
feature vector coefficients, symbol table for feature, feature
extractors and flag indicating whether to add intercepts or
not.

Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type
Method and Description
boolean
addInterceptFeature()
Returns true
if this CRF adds an intercept feature
with value 1.0 at index 0 to all feature vectors.
Vector[]
coefficients()
Return the coefficient vectors for this CRF.
static <F> ChainCrf<F>
estimate(Corpus<ObjectHandler<Tagging<F>>> corpus,
ChainCrfFeatureExtractor<F> featureExtractor,
boolean addInterceptFeature,
int minFeatureCount,
boolean cacheFeatureVectors,
boolean allowUnseenTransitions,
RegressionPrior prior,
int priorBlockSize,
AnnealingSchedule annealingSchedule,
double minImprovement,
int minEpochs,
int maxEpochs,
Reporter reporter)
Return the CRF estimated using stochastic gradient descent with
the specified prior from the specified corpus of taggings of
type F
pruned to the specified minimum feature count,
using the specified feature extractor, automatically adding an
intercept feature if the flag is true
, allow unseen tag
transitions as specified, using the specified training
parameters for annealing, measuring convergence, and reporting
the incremental results to the specified reporter.
ChainCrfFeatureExtractor<E>
featureExtractor()
Return the feature extractor for this CRF.
SymbolTable
featureSymbolTable()
Returns an unmodifiable view of the symbol table for features for
this CRF.
String
tag(int k)
Returns the tag for the specified tag index.
Tagging<E>
tag(List<E> tokens)
Return the tagging for the specified list of tokens.
TagLattice<E>
tagMarginal(List<E> tokens)
Return the marginal tagging for the specified list of
input tokens.
Iterator<ScoredTagging<E>>
tagNBest(List<E> tokens,
int maxResults)
Return an iterator over the n
best scored taggings for
the specified input tokens up to a specified maximum n
.
Iterator<ScoredTagging<E>>
tagNBestConditional(List<E> tokens,
int maxResults)
Return an iterator over the n
best scored taggings for
the specified input tokens up to a specified maximum n
,
with scores normalized to conditional probabilities.
List<String>
tags()
Returns an unmodifiable view of the array of tags underlying this CRF.
String
toString()
Return a stringbased representation of this chain CRF.


Constructor Detail

ChainCrf
public ChainCrf(String[] tags,
Vector[] coefficients,
SymbolTable featureSymbolTable,
ChainCrfFeatureExtractor<E> featureExtractor,
boolean addInterceptFeature)
Construct a conditional random field from the specified tags,
feature vector coefficients, symbol table for feature, feature
extractors and flag indicating whether to add intercepts or
not.
 Parameters:
tags
 Array of output tags.
coefficients
 Array of coefficient vectors parallel to tags.
featureSymbolTable
 Symbol table for feature extraction
to vectors.
featureExtractor
 CRF feature extractor.
addInterceptFeature
 true
if an intercept feature
at position 0 with value 1 is added to all feature vectors.
 Throws:
IllegalArgumentException
 If the tag and coefficient
vector arrays are not nonempty and the same length, or if the
coefficient vectors are not all of the same number of dimensions.

ChainCrf
public ChainCrf(String[] tags,
boolean[] legalTagStarts,
boolean[] legalTagEnds,
boolean[][] legalTagTransitions,
Vector[] coefficients,
SymbolTable featureSymbolTable,
ChainCrfFeatureExtractor<E> featureExtractor,
boolean addInterceptFeature)
Construct a conditional random field from the specified tags,
feature vector coefficients, symbol table for feature, feature
extractors and flag indicating whether to add intercepts or
not.
 Parameters:
tags
 Array of output tags
legalTagStarts
 Array of flags indicating if tag may be
first tag for a tagging.
legalTagEnds
 Array of flags indicating if tag may be
last tag for a tagging.
legalTagTransitions
 Two dimensional array of flags indicating
if the first tag may be followed by the second tag.
coefficients
 Array of coefficient vectors parallel to tags.
featureSymbolTable
 Symbol table for feature extraction
to vectors.
featureExtractor
 CRF feature extractor.
addInterceptFeature
 true
if an intercept feature
at position 0 with value 1 is added to all feature vectors.
 Throws:
IllegalArgumentException
 If the tag and coefficient
vector arrays are not nonempty and the same length, or if the
coefficient vectors are not all of the same number of dimensions.

Method Detail

tags
public List<String> tags()
Returns an unmodifiable view of the array of tags underlying this CRF.
The array of coefficient vectors is parallel to the array of
tags returned by tags()
k, so the coefficient vector
coefficients()[n]
is for output tag tags()[n]
.
 Returns:
 View of the output tags.

tag
public String tag(int k)
Returns the tag for the specified tag index. This uses the
underlying tags, so that tag(k) == tags()[k]
.
 Parameters:
k
 Position of tag.
 Returns:
 Tag for the specified position.
 Throws:
ArrayIndexOutOfBoundsException
 If the specified index
is out of bounds for the tag array (k < 0
or k
>= tags().length
).

coefficients
public Vector[] coefficients()
Return the coefficient vectors for this CRF.
The array of coefficient vectors is parallel to the array of
tags returned by tags()
k, so the coefficient vector
coefficients()[n]
is for output tag tags()[n]
.
 Returns:
 The coefficient vectors.

featureSymbolTable
public SymbolTable featureSymbolTable()
Returns an unmodifiable view of the symbol table for features for
this CRF.
 Returns:
 A view of the symbol table for features.

featureExtractor
public ChainCrfFeatureExtractor<E> featureExtractor()
Return the feature extractor for this CRF.
 Returns:
 The feature extractor.

addInterceptFeature
public boolean addInterceptFeature()
Returns true
if this CRF adds an intercept feature
with value 1.0 at index 0 to all feature vectors.
 Returns:
 Whether this CRF adds an intercept feature.

tag
public Tagging<E> tag(List<E> tokens)
Description copied from interface: Tagger
Return the tagging for the specified list of tokens.

tagNBest
public Iterator<ScoredTagging<E>> tagNBest(List<E> tokens,
int maxResults)
Description copied from interface: NBestTagger
Return an iterator over the n
best scored taggings for
the specified input tokens up to a specified maximum n
.
 Specified by:
tagNBest
in interface NBestTagger<E>
 Parameters:
tokens
 Input tokens to tag.
maxResults
 Maximum number of results to return.
 Returns:
 Iterator over the nbest scored taggings for the
specified tokens.

tagNBestConditional
public Iterator<ScoredTagging<E>> tagNBestConditional(List<E> tokens,
int maxResults)
Description copied from interface: NBestTagger
Return an iterator over the n
best scored taggings for
the specified input tokens up to a specified maximum n
,
with scores normalized to conditional probabilities.
Optional operation.
 Specified by:
tagNBestConditional
in interface NBestTagger<E>
 Parameters:
tokens
 Input tokens to tag.
maxResults
 Maximum number of results to return.
 Returns:
 Iterator over the nbest scored taggings for the
specified tokens.

tagMarginal
public TagLattice<E> tagMarginal(List<E> tokens)
Description copied from interface: MarginalTagger
Return the marginal tagging for the specified list of
input tokens.
 Specified by:
tagMarginal
in interface MarginalTagger<E>
 Parameters:
tokens
 Input tokens to tag.
 Returns:
 The lattice of tags for the specified tokens.

toString
public String toString()
Return a stringbased representation of this chain CRF.
All information returned in this string representation is
available programatically.
Warning: The output is very verbose, including
symbolic representations of all the coefficients.

estimate
public static <F> ChainCrf<F> estimate(Corpus<ObjectHandler<Tagging<F>>> corpus,
ChainCrfFeatureExtractor<F> featureExtractor,
boolean addInterceptFeature,
int minFeatureCount,
boolean cacheFeatureVectors,
boolean allowUnseenTransitions,
RegressionPrior prior,
int priorBlockSize,
AnnealingSchedule annealingSchedule,
double minImprovement,
int minEpochs,
int maxEpochs,
Reporter reporter)
throws IOException
Return the CRF estimated using stochastic gradient descent with
the specified prior from the specified corpus of taggings of
type F
pruned to the specified minimum feature count,
using the specified feature extractor, automatically adding an
intercept feature if the flag is true
, allow unseen tag
transitions as specified, using the specified training
parameters for annealing, measuring convergence, and reporting
the incremental results to the specified reporter.
Reporting at the info level provides parameter and epoch
level. At the debug level, it reports epochbyepoch
likelihoods.
 Parameters:
corpus
 Corpus from which to estimate.
featureExtractor
 Feature extractor for the CRF.
addInterceptFeature
 Set to true
if an intercept
feature with index 0 is automatically added to all feature
vectors with value 1.0.
minFeatureCount
 Minimum number of instances of a feature
to keep it.
cacheFeatureVectors
 Flag indicating whether or not to
keep the computed feature vectors in memory.
allowUnseenTransitions
 Flag indicating whether to allow
tags to start a tagging, end a tagging, or follow another tag
if there was not an example of that in the corpus.
prior
 Prior for coefficients to use during estimation.
annealingSchedule
 Schedule for annealing the learning
rate during gradient descent.
minImprovement
 Minimum relative improvement objective
(log likelihood plus log prior) computed as a 10epoch rolling
average to signal convergence.
minEpochs
 Minimum number of epochs for which to run
gradient descent estimation.
maxEpochs
 Maximum number of epochs for which to run
gradient descent estimation.
reporter
 Reporter to which results are written, or
null
for no reporting of intermediate results.
 Throws:
IOException