E- the type of object being classified
public class PerceptronClassifier<E> extends Object implements ScoredClassifier<E>, Serializable
PerceptronClassifierimplements a binary classifier based on an averaged kernel-based perceptron. These classifiers are large margin (discriminitive) linear classifiers in a feature space expanded by a plug-and-play kernel implemeting
A perceptron classifier may be applied to any type of object.
An appropriately typed
FeatureExtractor is used to map
these objects to feature vectors for use in the perceptron
Unlike the language-model-based classifiers, for which training, classification and compilation may be interleaved, averaged perceptron-based classifiers require batch training. This requires the entire training corpus to be available in one shot. In particular, training will iterate over a fixed corpus multiple times before producing a completely trained classifier.
The constructor will do the training using the supplied instance of
Corpus. The constructor will store the entire corpus in
memory in the form of
SparseFloatVector and boolean
polarities for the accept/reject decision. The corpus will only be
held locally in the constructor; it is available for garbage
collection, as are all intermediate results, as soon as the
constructor is done training.
The basic (non-kernel) perceptron is equivalent to using the
DotProductKernel. A good
choice for most classification tasks is the polynomial kernel,
Usually, higher polynomial kernel degrees perform dramatically
better than dot products. 3 is a good general starting degree, as
sometimes performance degrades in higher kernel degrees. In
some cases, the Gaussian radial basis kernel implemented in
GaussianRadialBasisKernel works well.
If the kernel function is neither serializable nor compilable, then the resulting perceptron classifier will not be serializable.
More training iterations are usually better for accuracy. As more basis vectors are added to the perceptron, more memory is needed for the model and more time is needed for classification. Typically, the amount of memory required at run time will stabilize after a few training iterations.
The memory usage of a perceptron classifier may be very high.
The perceptron must store every feature vector in the input on
which the classifier being trained made a mistake in some
iteration. During training, every feature vector in the input
corpus must be stored. These are stored in memory as instances of
If the data are linearly separable in the kernel space, the training process will converge to the point where no additional basis feature vectors are needed and the result will converge to using the single final perceptron (which requires all the intermediate to be stored given the demands of a non-linear kernel calculation).
After a perceptron classifier is constructed, it may be
serialized to an object output stream. If the underlying
feature extractor is compilable, it's compiled, but if
it's not compilable, it's serialized. To be serializable,
a perceptron classifier requires both its feature extractor
and kernel function to be
The object read back in after serialization will
be an instance of
An average of all of the perceptrons created during training is used for the final prediction. The factored formulation of the algorithm allows the perceptrons to be expressed as linearly weighted training samples.
Although theoretical bounds are almost equivalent, in practice averaged perceptrons slightly underperform support vector machine (SVM) learners over the same polynomial kernels. The advantage of perceptrons is that they are much more efficient in time and slightly more efficient in space in practice.
The model used for runtime predictions by the averaged perceptron is quite straightforward, consisting of a set of weighted feature vectors (represented as parallel arrays of basis vectors and weights) and a kernel degree:
The basis vectors are all vectors derived from single training examples by the specified feature extractor. The weights may be positive or negative and represent the cumulative voted weight of the specified basis vector.Vector basisVectors; int weights; int degree;
The kernel function computes a distance
kernel(v1,v2) between vectors
v2 in an enhanced feature space defined by the
particular kernel employed.
A new input to classify is first converted to a feature vector by the feature extractor. Classification is then based on the sign of the following score:
An example is accepted if the score of its feature vector is greater than zero and rejected otherwise.score(Vector v) = Σi weights[i] * kernel(basisVectors[i],v)
Vector basisVectors; int incrementalWeights; boolean polarities; int degree; int index = -1; for (# iterations) for (vector,polarity) in training corpus yHat = scoreIntermediate(vector); if (yHat > 0 && polarity) || (yHat < 0 && !polarity) ++incrementalWeights[index]; else ++index; basisVectors[index] = vector; polarities[index] = polarity; incrementalWeights[index] = 1;
The final weight for a vector is the cumulative weight computed as follows:scoreIntermediate(vector) = Σi <= index polarities[i] * kernel(basisVectors[i],vector)
The actual implementations of these methods involve considerably more indirection and index chasing to avoid copies and duplication in the final vectors.cumulativeWeight(j) = Σk >= j incrementalWeights[k]
The averaged kernel perceptron implemented here was introduced in the following paper, which also provides error bounds for learning and evaluations with polynomial kernels of various degrees:
Freund, Yoav and Robert E. Schapire (1999) Large margin classification using the perceptron algorithm. Machine Learning 37(3):277-296.The basic perceptron model was introduced in:
Block, H.D. (1962) The perceptron: a model for brain functioning. Reviews of Modern Physics 34:123-135.
The kernel-based perceptron was introduced in:
Aizerman, M.A., E.M. Braverman, and L.I. Rozonoer. 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control. 25:821-837.The basis of the voting scheme is a deterministically averaged version of the randomized approach of adapting online learners to a batch setup described in the following paper:
Helmbold, D.P. and M.K. Warmuth. (1995) On weak learning. Journal of Computer and System Sciences 50:551-573.
|Constructor and Description|
Construct a perceptron classifier from the specified feature extractor, corpus with designated accept category, polynomial kernel degree and number of training iterations, and output accept and reject categories.
|Modifier and Type||Method and Description|
Return the scored classification for the specified input.
Returns the feature extractor for this perceptron.
Returns the kernel function for this perceptron.
Returns a string-based representation of this perceptron.
public PerceptronClassifier(Corpus<ObjectHandler<Classified<E>>> corpus, FeatureExtractor<? super E> featureExtractor, KernelFunction kernelFunction, String corpusAcceptCategory, int numIterations, String outputAcceptCategory, String outputRejectCategory) throws IOException
corpus- Corpus to use for training.
featureExtractor- Feature extractor for objects.
corpusAcceptCategory- Category in training data to treat as positive.
kernelFunction- Kernel function for expanding vector basis.
numIterations- Number of iterations to carry out during training.
outputAcceptCategory- Category with which to label accepted instances.
outputRejectCategory- Category with which to label rejected instances.
public KernelFunction kernelFunction()
public FeatureExtractor<? super E> featureExtractor()
public String toString()
public ScoredClassification classify(E in)