public class ChunkerFeatureExtractor extends Object implements FeatureExtractor<CharSequence>, Serializable
ChunkerFeatureExtractorimplements a feature extractor for character sequences based on a specified chunker. Feature names are derived from the chunk types optionally concatenated to the phrase making up the chunk. Feature values are the count of their occurrences.
For instance, if a chunker were to return a chunk of type
PER spanning the phrase
John and a chunk of type
LOC spanning the phrase
New York, then the features will
PER:1, LOC:1 if the phrases are not included and
PER_John:1, LOC_New York:1. If the phrase
had shown up three times, the value for
3 (assuming types are included).
|Constructor and Description|
Construct a new chunker feature extractor based on the specified chunker, including the phrases extracted if the specified flag is true.
|Modifier and Type||Method and Description|
Return the feature vector for the specified input.
public ChunkerFeatureExtractor(Chunker chunker, boolean includePhrase)
chunker- Base chunker for the extractor.
includePhrase- Set to
trueto append the phrase derived from the chunk to the feature name.
public Map<String,? extends Number> features(CharSequence in)