E
 the type of object being classifiedpublic class PerceptronClassifier<E> extends Object implements ScoredClassifier<E>, Serializable
PerceptronClassifier
implements a binary classifier
based on an averaged kernelbased perceptron. These
classifiers are large margin (discriminitive) linear classifiers in
a feature space expanded by a plugandplay kernel implemeting
KernelFunction
.
A perceptron classifier may be applied to any type of object.
An appropriately typed FeatureExtractor
is used to map
these objects to feature vectors for use in the perceptron
classifier.
Unlike the languagemodelbased classifiers, for which training, classification and compilation may be interleaved, averaged perceptronbased classifiers require batch training. This requires the entire training corpus to be available in one shot. In particular, training will iterate over a fixed corpus multiple times before producing a completely trained classifier.
The constructor will do the training using the supplied instance of
Corpus
. The constructor will store the entire corpus in
memory in the form of SparseFloatVector
and boolean
polarities for the accept/reject decision. The corpus will only be
held locally in the constructor; it is available for garbage
collection, as are all intermediate results, as soon as the
constructor is done training.
The basic (nonkernel) perceptron is equivalent to using the
kernel function DotProductKernel
. A good
choice for most classification tasks is the polynomial kernel,
implemented in PolynomialKernel
.
Usually, higher polynomial kernel degrees perform dramatically
better than dot products. 3 is a good general starting degree, as
sometimes performance degrades in higher kernel degrees. In
some cases, the Gaussian radial basis kernel implemented in
GaussianRadialBasisKernel
works well.
If the kernel function is neither serializable nor compilable, then the resulting perceptron classifier will not be serializable.
More training iterations are usually better for accuracy. As more basis vectors are added to the perceptron, more memory is needed for the model and more time is needed for classification. Typically, the amount of memory required at run time will stabilize after a few training iterations.
The memory usage of a perceptron classifier may be very high.
The perceptron must store every feature vector in the input on
which the classifier being trained made a mistake in some
iteration. During training, every feature vector in the input
corpus must be stored. These are stored in memory as instances of
SparseFloatVector
.
If the data are linearly separable in the kernel space, the training process will converge to the point where no additional basis feature vectors are needed and the result will converge to using the single final perceptron (which requires all the intermediate to be stored given the demands of a nonlinear kernel calculation).
After a perceptron classifier is constructed, it may be
serialized to an object output stream. If the underlying
feature extractor is compilable, it's compiled, but if
it's not compilable, it's serialized. To be serializable,
a perceptron classifier requires both its feature extractor
and kernel function to be Serializable
.
The object read back in after serialization will
be an instance of PerceptronClassifier
.
Perceptrons are a kind of largemargin linear classifier. The polynomial kernel trick is used to embed the basic feature vector in a higherdegree vector space in which the data are more separable.
An average of all of the perceptrons created during training is used for the final prediction. The factored formulation of the algorithm allows the perceptrons to be expressed as linearly weighted training samples.
Although theoretical bounds are almost equivalent, in practice averaged perceptrons slightly underperform support vector machine (SVM) learners over the same polynomial kernels. The advantage of perceptrons is that they are much more efficient in time and slightly more efficient in space in practice.
The model used for runtime predictions by the averaged perceptron is quite straightforward, consisting of a set of weighted feature vectors (represented as parallel arrays of basis vectors and weights) and a kernel degree:
The basis vectors are all vectors derived from single training examples by the specified feature extractor. The weights may be positive or negative and represent the cumulative voted weight of the specified basis vector.Vector[] basisVectors; int[] weights; int degree;
The kernel function computes a distance
kernel(v1,v2)
between vectors v1
and
v2
in an enhanced feature space defined by the
particular kernel employed.
A new input to classify is first converted to a feature vector by the feature extractor. Classification is then based on the sign of the following score:
An example is accepted if the score of its feature vector is greater than zero and rejected otherwise.score(Vector v) = Σ_{i} weights[i] * kernel(basisVectors[i],v)
Vector[] basisVectors; int[] incrementalWeights; boolean[] polarities; int degree; int index = 1; for (# iterations) for (vector,polarity) in training corpus yHat = scoreIntermediate(vector); if (yHat > 0 && polarity)  (yHat < 0 && !polarity) ++incrementalWeights[index]; else ++index; basisVectors[index] = vector; polarities[index] = polarity; incrementalWeights[index] = 1;
The final weight for a vector is the cumulative weight computed as follows:scoreIntermediate(vector) = Σ_{i <= index} polarities[i] * kernel(basisVectors[i],vector)
The actual implementations of these methods involve considerably more indirection and index chasing to avoid copies and duplication in the final vectors.cumulativeWeight(j) = Σ_{k >= j} incrementalWeights[k]
The averaged kernel perceptron implemented here was introduced in the following paper, which also provides error bounds for learning and evaluations with polynomial kernels of various degrees:
Freund, Yoav and Robert E. Schapire (1999) Large margin classification using the perceptron algorithm. Machine Learning 37(3):277296.The basic perceptron model was introduced in:
Block, H.D. (1962) The perceptron: a model for brain functioning. Reviews of Modern Physics 34:123135.
The kernelbased perceptron was introduced in:
Aizerman, M.A., E.M. Braverman, and L.I. Rozonoer. 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control. 25:821837.The basis of the voting scheme is a deterministically averaged version of the randomized approach of adapting online learners to a batch setup described in the following paper:
Helmbold, D.P. and M.K. Warmuth. (1995) On weak learning. Journal of Computer and System Sciences 50:551573.
Constructor and Description 

PerceptronClassifier(Corpus<ObjectHandler<Classified<E>>> corpus,
FeatureExtractor<? super E> featureExtractor,
KernelFunction kernelFunction,
String corpusAcceptCategory,
int numIterations,
String outputAcceptCategory,
String outputRejectCategory)
Construct a perceptron classifier from the specified feature extractor,
corpus with designated accept category, polynomial kernel degree and
number of training iterations, and output accept and reject categories.

Modifier and Type  Method and Description 

ScoredClassification 
classify(E in)
Return the scored classification for the specified input.

FeatureExtractor<? super E> 
featureExtractor()
Returns the feature extractor for this perceptron.

KernelFunction 
kernelFunction()
Returns the kernel function for this perceptron.

String 
toString()
Returns a stringbased representation of this perceptron.

public PerceptronClassifier(Corpus<ObjectHandler<Classified<E>>> corpus, FeatureExtractor<? super E> featureExtractor, KernelFunction kernelFunction, String corpusAcceptCategory, int numIterations, String outputAcceptCategory, String outputRejectCategory) throws IOException
corpus
 Corpus to use for training.featureExtractor
 Feature extractor for objects.corpusAcceptCategory
 Category in training data to treat as positive.kernelFunction
 Kernel function for expanding vector basis.numIterations
 Number of iterations to carry out during training.outputAcceptCategory
 Category with which to label accepted instances.outputRejectCategory
 Category with which to label rejected instances.IOException
public KernelFunction kernelFunction()
public FeatureExtractor<? super E> featureExtractor()
public String toString()
public ScoredClassification classify(E in)
classify
in interface BaseClassifier<E>
classify
in interface RankedClassifier<E>
classify
in interface ScoredClassifier<E>
in
 The element to be classified.