TokenShapeChunkeruses a named-entity
TokenShapeDecoderand tokenizer factory to implement entity detection through the
chunk.Chunkerinterface. A named-entity chunker is constructed from a tokenizer factory and decoder. The tokenizer factory creates the tokens that are sent to the decoder. The chunks have types derived from the named-entity types found.
The tokens and whitespaces returned by the tokenizer are concatenated to form the underlying text slice of the chunks returned by the chunker. Thus a tokenizer like the stop list tokenizer or Porter stemmer tokenizer will create a character slice that does not match the input. A whitespace-normalizing tokenizer filter can be used, for example, to produce normalized text for the basis of the chunks.
|Modifier and Type||Method and Description|
Return the set of named-entity chunks derived from the underlying decoder over the tokenization of the specified character slice.
Return the set of named-entity chunks derived from the uderlying decoder over the tokenization of the specified character sequence.
Sets the log (base 2) beam width for the decoder.
public Chunking chunk(CharSequence cSeq)
For more information on return results, see
public Chunking chunk(char cs, int start, int end)
public void setLog2Beam(double beamWidth)
beamWidth- Width of beam.
IllegalArgumentException- If the beam width is not positive.