com.aliasi.tag

## Class TagLattice<E>

• Type Parameters:
`E` - Type of tokens in the lattice.
Direct Known Subclasses:
ForwardBackwardTagLattice

```public abstract class TagLattice<E>
extends Object```
The abstract base class `TagLattice` provides an interface for the output of a marginal tagger, based on forward, backward, and transition log potentials. The allowance of general potentials makes this interface suitable for the output of either hidden Markov models (HMM) or linear chain conditional random fields (CRF).

### Marginal Tag Probabilities

The forward, backward, and normalizing terms are used to define the marginal probability of a single tag at a specified position given the inputs:

``` p(tag[n]=k | toks,...,toks[N-1])
= fwd(n,k) * bk(n,k) / Z```
To avoid problems with overflow, natural log values are used. On the log scale,
``` log p(tag[n]=k | toks,...,toks[N-1])
= log fwd(n,k) + log bk(n,k) - log Z```
Warning: No checks are made to ensure that values supplied to the constructor form a coherent probability estimate.

### Marginal Tag Sequence Probabilities

This allows us to compute the probability of a sequence of tags, for instance for a phrase, by:

``` p(tag[n]=k0, tag[n+1]=k1, ..., tag[n+m]=km | toks)
= fwd(n,k0) * bk(n+m,km) * Πi < m trans(n+i,tags[n+i],tags[n+i+1]) / Z```
On the log scale, this becomes:
``` log p(tag[n]=k0, tag[n+1]=k1, ..., tag[n+m]=km | toks)
= log fwd(n,k0) + log bk(n+m,km) * Σi < m log trans(n+i,tags[n+i],tags[n+i+1]) - log Z```
The log of this value returned by `logProbability(int,int[])`, where the first argument is `n` and the second argument `{k0,k1,...,km}`.

### Linear Probabilities

Linear probabilities (or potentials) are computed by exponentiating the log probabilities returned by the various methods.

### Transition Potentials

The transitions are defined by:

` trans(n,k1,k2) = φ(tag[n]=k2|tag[n-1]=k1)`
where `φ` is the transition potential from the tag k1 at position n-1 to tag k2 at position n. The log of this value is returned by `logTransition(int,int,int)`, where the first argument is `n` and the second two `k1` and `k2`.

### Forward Potentials

The forward values are defined by:

` log fwd(0,k) = init(k)`
where `init(k)` is the start potential, defined on an application-specific basis. The recursive step defines the forward values for subsequent positions 0 < n < N& and tags k
` fwd(n,k) = Σk' fwd(n-1,k') * trans(n-1,k,k')`
This is typically computed for all n and k in n*k time using the forward algorithm.

The log of this value is returned by `logForward(int,int)` where the first argument is `n` and the second `k`.

### Backward Potentials

The backward potentials are similar, but condition on rather than predict the label. This simplifies the basis of the recursion to

` bk(N-1,k) = 1.0`
where N is the length of the input (number of tokens). The recursive step for 0 <= n < N-1 is
` bk(n,k) = Σk' trans(n,k,k') * bk(n+1,k')`
The log of this value is returned by `logBackward(int,int)` where the first argument is `n` and the second `k`.

### Normalizer

The normalizer is the sum over all paths, and acts to normalize sequences to probabilities. It may be computed by:

` Z = Σk fwd(N-1,k)`
The log of this value is returned by `logZ()`.

### Probabilistic Lattices

In some settings, such as hidden Markov models (HMMs) the forward, backward, transition and normalizer may be interpreted as probabilities.

` trans(n,k1,k2) = p(tag[n]=k2|tag[n-1]=k1)`
` fwd(n,k) = p(label[n]=k, toks, ..., toks[n])`
` bk(n,k) = p(toks[n+1],...,toks[N-1] | label[n]=k)`
` Z = p(toks,...,toks[N-1])`
Since:
LingPipe3.9
Version:
3.9
Author:
Bob Carpenter
• ### Constructor Summary

Constructors
Constructor and Description
`TagLattice()`
Construct an empty tag lattice.
• ### Method Summary

All Methods
Modifier and Type Method and Description
`abstract double` ```logBackward(int token, int tag)```
Returns the log of the backward probability to the specified token and tag.
`abstract double` ```logForward(int token, int tag)```
Return the log of the forward probability of the specified tag at the specified position.
`abstract double` ```logProbability(int n, int tag)```
Convenience method returning the log of the conditional probability that the specified token has the specified tag, given the complete list of input tokens.
`abstract double` ```logProbability(int nFrom, int[] tags)```
Return the log conditional probability that the tokens starting with the specified token position have the specified tags given the complete sequence of input tokens.
`abstract double` ```logProbability(int nTo, int tagFrom, int tagTo)```
Convenience method returning the log of the conditional probability that the specified two tokens have the specified tag given the complete list of input tokens.
`abstract double` ```logTransition(int tokenFrom, int tagFrom, int tagTo)```
Returns the log of the transition probability from the specified input token position with the specified previous tag to the specified target tag.
`abstract double` `logZ()`
Return the log of the normalizing constant for the lattice.
`abstract int` `numTags()`
Return the number of tags in this tag lattice.
`abstract int` `numTokens()`
Returns the length of this tag lattice as measured by number of tokens.
`abstract String` `tag(int id)`
Return the tag with the specified symbol identifier.
`abstract List<String>` `tagList()`
Returns an unmodifiable view of the list of tags used in this lattice, indexed by identifier.
`abstract SymbolTable` `tagSymbolTable()`
Returns a symbol table which converts tags to identifiers and vice-versa.
`abstract E` `token(int n)`
Return the token at the specified position in the input.
`ConditionalClassification` `tokenClassification(int tokenIndex)`
Returns the classification of the token at the specified position in this tag lattice.
`abstract List<E>` `tokenList()`
Return an unmodifiable view of the underlying tokens for this tag lattice.
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Constructor Detail

• #### TagLattice

`public TagLattice()`
Construct an empty tag lattice.
• ### Method Detail

• #### tokenList

`public abstract List<E> tokenList()`
Return an unmodifiable view of the underlying tokens for this tag lattice.
Returns:
The tokens for this lattice.
• #### tagList

`public abstract List<String> tagList()`
Returns an unmodifiable view of the list of tags used in this lattice, indexed by identifier.
Returns:
The symbol table for tags.
• #### tag

`public abstract String tag(int id)`
Return the tag with the specified symbol identifier.
Parameters:
`id` - Identifer for tag.
Returns:
Tag with specified ID.
Throws:
`IndexOutOfBoundsException` - If the specified identifier is not in range for the list of tags.
• #### numTags

`public abstract int numTags()`
Return the number of tags in this tag lattice.
Returns:
Number of tags for this tag lattice.
• #### token

`public abstract E token(int n)`
Return the token at the specified position in the input.
Parameters:
`n` - Input position.
Returns:
Token at position.
Throws:
`IndexOutOfBoundsException` - If the specified index is not in range for the list of tokens.
• #### numTokens

`public abstract int numTokens()`
Returns the length of this tag lattice as measured by number of tokens.
Returns:
Number of tokens in this lattice.
• #### tagSymbolTable

`public abstract SymbolTable tagSymbolTable()`
Returns a symbol table which converts tags to identifiers and vice-versa.

A new symbol table is constructed for each call, so it should be saved and reused if possible. Changing the returned symbol table will not affect this lattice.

Returns:
Symbol table for tags in this lattice.
• #### logProbability

```public abstract double logProbability(int n,
int tag)```
Convenience method returning the log of the conditional probability that the specified token has the specified tag, given the complete list of input tokens.

This method returns results defined by

``` logProbability(n,tag)
== logProbability(n,new int[] { tag })```
Parameters:
`n` - Position of input token.
`tag` - Identifier of tag.
Returns:
The log probability the token has the tag.
Throws:
`ArrayIndexOutOfBoundsException` - If the token or tag identifiers are not in range.
• #### logProbability

```public abstract double logProbability(int nTo,
int tagFrom,
int tagTo)```
Convenience method returning the log of the conditional probability that the specified two tokens have the specified tag given the complete list of input tokens.

This method returns results defined by

``` logProbability(nTo,tagFrom,tagTo)
== logProbability(n-1,new int[] { tagFrom, tagTo })```
Parameters:
`nTo` - Position of second token.
`tagFrom` - First Tag from which transition is made.
`tagTo` - Second Tag to which transition is made.
Returns:
Log probability of the tags at the specified position.
• #### logProbability

```public abstract double logProbability(int nFrom,
int[] tags)```
Return the log conditional probability that the tokens starting with the specified token position have the specified tags given the complete sequence of input tokens.
Parameters:
`nFrom` - Starting position of sequence.
`tags` - Tag identifiers for sequence.
Returns:
Log probability that sequence starting at the specified position has the specified tags.
Throws:
`IllegalArgumentException` - If the token is out of range or the token plus the length of the tag sequence is out of range of tokens, or if any of the tags is not a known identifier.
• #### logForward

```public abstract double logForward(int token,
int tag)```
Return the log of the forward probability of the specified tag at the specified position. The forward probability is the sum of the joint probabilities of all sequences from the initial token to the specified token ending with the specified tag.
Parameters:
`token` - Token position.
`tag` - Tag identifier.
Returns:
Log forward probability specified token has specified tag.
Throws:
`ArrayIndexOutOfBoundsException` - If the token or tag index are out of bounds for this lattice's tokens or tags.
• #### logBackward

```public abstract double logBackward(int token,
int tag)```
Returns the log of the backward probability to the specified token and tag. The backward probability is the sum of the joint probabilities of all sequences starting from the specified token and specified tag and going to the end of the list of tokens.
Parameters:
`token` - Input token position.
`tag` - Tag identifier.
Throws:
`ArrayIndexOutOfBoundsException` - If the token or tag index are out of bounds for this lattice's tokens or tags.
• #### logTransition

```public abstract double logTransition(int tokenFrom,
int tagFrom,
int tagTo)```
Returns the log of the transition probability from the specified input token position with the specified previous tag to the specified target tag.
Parameters:
`tokenFrom` - Token position from which the transition is made.
`tagFrom` - Identifier for the previous tag from which the transition is made.
`tagTo` - Tag identifier for the target tag to which the the transition is made.
Returns:
Log probability of the transition.
Throws:
`ArrayIndexOutOfBoundsException` - If the token index or either of the tag indexes are out of bounds for this lattice's tokens or tags.
• #### logZ

`public abstract double logZ()`
Return the log of the normalizing constant for the lattice. Its value is the log of the marginal probability of the input tokens. By the additive law of probability, this is equivalent to the sum of the probabilities of all possible analyses for the input sequence
Returns:
The normalizing constant.
• #### tokenClassification

`public ConditionalClassification tokenClassification(int tokenIndex)`
Returns the classification of the token at the specified position in this tag lattice. Tag probabilities are conditional probabilities of a tag given the entire sequence of input tokens. The tags are represented in their string form.
Parameters:
`tokenIndex` - Position of token to classify.
Returns:
Classification of token in terms of tags.