B
 the type of the underlying nbest chunkerpublic abstract class RescoringChunker<B extends NBestChunker> extends Object implements NBestChunker, ConfidenceChunker
RescoringChunker
provides first best, nbest and
confidence chunking by rescoring nbest chunkings derived from a
contained chunker.
Concrete subclasses must implement the abstract method rescore(Chunking)
, which provides a score for a chunking. There
are no restrictions on how this score is computed; most typically,
it will be a longerdistance/higherorder model than the contained
chunker and provide more accurate results.
The nbest chunker works by generating the top analyses from the
contained chunker. The number of such analyses considered is
determined in the constructor for this class. These are then
placed in a bounded priority queue with the bound determined by the
maximum specified in the call to nBest(char[],int,int,int)
.
The firstbest chunker methods chunk(CharSequence)
and
chunk(char[],int,int)
operate by choosing the top scoring
chunking from the rescoring of the contained chunker. The number
of chunkings from the contained chunker that are rescored is
determined in the constructor. This is more memory and time
efficient than running the nbest chunking.
nBestChunks(char[],int,int,int)
method is implemented
by walking over the nbest analyses generated by nBest(char[],int,int,int)
with a maximum nbest for full analyses
set to the value of numChunkingsRescored()
, which may be
changed using setNumChunkingsRescored(int)
. For each
analysis, the chunks are pulled out and their weight is incremented
by the nbest analysis weight. Normalization is carried out by
dividing by the total probability mass in the returned nbest list.
baseChunker()
.Constructor and Description 

RescoringChunker(B chunker,
int numChunkingsRescored)
Construct a rescoring chunker that contains the specified base
chunker and considers the specified number of chunkings for
rescoring.

Modifier and Type  Method and Description 

B 
baseChunker()
The base chunker that generates hypotheses to rescore.

Chunking 
chunk(char[] cs,
int start,
int end)
Returns the firstbest chunking for the specified character
slice.

Chunking 
chunk(CharSequence cSeq)
Returns the firstbest chunking for the specified character
sequence.

Iterator<ScoredObject<Chunking>> 
nBest(char[] cs,
int start,
int end,
int maxNBest)
Returns the nbest chunkings of the specified character slice.

Iterator<Chunk> 
nBestChunks(char[] cs,
int start,
int end,
int maxNBest)
Returns the nbest chunks for the specified character slice up to
the specified maximum number of chunks.

int 
numChunkingsRescored()
Return the number of chunkings to generate from the base
chunker for rescoring.

abstract double 
rescore(Chunking chunking)
Returns the score for a chunking.

void 
setNumChunkingsRescored(int numChunkingsRescored)
Set the number of base chunkings to rescore.

public RescoringChunker(B chunker, int numChunkingsRescored)
chunker
 Base nbest chunker.numChunkingsRescored
 Number of chunkings generated
by the base chunker to rescore.public abstract double rescore(Chunking chunking)
The rescoring should be in the form of log (base 2) joint
probability estimate for the specified chunking. For the
simple wholeanalysis rescoring method nBest(char[],int,int,int)
, this is not checked, and any
values may be used in practice. For the nbest chunk method
nBestChunks(char[],int,int,int)
, the scores are
treated as log probabilities, but renormalized in order to
compute conditional chunk probability estimates.
chunking
 Chunking to rescore.public B baseChunker()
public int numChunkingsRescored()
public void setNumChunkingsRescored(int numChunkingsRescored)
numChunkingsRescored
 Number of base chunkings to
rescore.public Chunking chunk(CharSequence cSeq)
public Chunking chunk(char[] cs, int start, int end)
public Iterator<ScoredObject<Chunking>> nBest(char[] cs, int start, int end, int maxNBest)
nBest
in interface NBestChunker
cs
 Underlying character array.start
 Index of first character to analyze.end
 Index of one past the last character to analyze.maxNBest
 The maximum number of results to return.npublic Iterator<Chunk> nBestChunks(char[] cs, int start, int end, int maxNBest)
See the class documentation above for implementation details.
nBestChunks
in interface ConfidenceChunker
cs
 Underlying characters.start
 Index of first character in slice.end
 Index of one past last character in slice.maxNBest
 Maximum number of chunks to return.