B- the type of the underlying n-best chunker
public abstract class RescoringChunker<B extends NBestChunker> extends Object implements NBestChunker, ConfidenceChunker
RescoringChunkerprovides first best, n-best and confidence chunking by rescoring n-best chunkings derived from a contained chunker.
Concrete subclasses must implement the abstract method
rescore(Chunking), which provides a score for a chunking. There
are no restrictions on how this score is computed; most typically,
it will be a longer-distance/higher-order model than the contained
chunker and provide more accurate results.
The n-best chunker works by generating the top analyses from the
contained chunker. The number of such analyses considered is
determined in the constructor for this class. These are then
placed in a bounded priority queue with the bound determined by the
maximum specified in the call to
The first-best chunker methods
chunk(char,int,int) operate by choosing the top scoring
chunking from the rescoring of the contained chunker. The number
of chunkings from the contained chunker that are rescored is
determined in the constructor. This is more memory and time
efficient than running the n-best chunking.
nBestChunks(char,int,int,int)method is implemented by walking over the n-best analyses generated by
nBest(char,int,int,int)with a maximum n-best for full analyses set to the value of
numChunkingsRescored(), which may be changed using
setNumChunkingsRescored(int). For each analysis, the chunks are pulled out and their weight is incremented by the n-best analysis weight. Normalization is carried out by dividing by the total probability mass in the returned n-best list.
|Constructor and Description|
Construct a rescoring chunker that contains the specified base chunker and considers the specified number of chunkings for rescoring.
|Modifier and Type||Method and Description|
The base chunker that generates hypotheses to rescore.
Returns the first-best chunking for the specified character slice.
Returns the first-best chunking for the specified character sequence.
Returns the n-best chunkings of the specified character slice.
Returns the n-best chunks for the specified character slice up to the specified maximum number of chunks.
Return the number of chunkings to generate from the base chunker for rescoring.
Returns the score for a chunking.
Set the number of base chunkings to rescore.
public RescoringChunker(B chunker, int numChunkingsRescored)
chunker- Base n-best chunker.
numChunkingsRescored- Number of chunkings generated by the base chunker to rescore.
public abstract double rescore(Chunking chunking)
The rescoring should be in the form of log (base 2) joint
probability estimate for the specified chunking. For the
simple whole-analysis rescoring method
nBest(char,int,int,int), this is not checked, and any
values may be used in practice. For the n-best chunk method
nBestChunks(char,int,int,int), the scores are
treated as log probabilities, but renormalized in order to
compute conditional chunk probability estimates.
chunking- Chunking to rescore.
public B baseChunker()
public int numChunkingsRescored()
public void setNumChunkingsRescored(int numChunkingsRescored)
numChunkingsRescored- Number of base chunkings to rescore.
public Chunking chunk(CharSequence cSeq)
public Chunking chunk(char cs, int start, int end)
public Iterator<ScoredObject<Chunking>> nBest(char cs, int start, int end, int maxNBest)
cs- Underlying character array.
start- Index of first character to analyze.
end- Index of one past the last character to analyze.
maxNBest- The maximum number of results to return.n
See the class documentation above for implementation details.