public class LogisticRegression extends Object implements Compilable, Serializable
LogisticRegression
instance is a multiclass vector
classifier model generating conditional probability estimates of
categories. This class also provides static factory methods for
estimating multinomial regression models using stochastic gradient
descent (SGD) to find maximum likelihood or maximum a posteriori
(MAP) estimates with Laplace, Gaussian, Cauchy priors on
coefficients.
The classification package contains a class LogisticRegressionClassifier
which adapts this
class's models and estimators to act as generic classifiers given
an instance of FeatureExtractor
.
Also Known As (AKA)
Multinomial logistic regression is also known as polytomous, polychotomous, or multiclass logistic regression, or just multilogit regression. Binary logistic regression is an instance of a generalized linear model (GLM) with the logit link function. The logit function is the inverse of the logistic function, and the logistic function is sometimes called the sigmoid function or the scurve. Logistic regression estimation obeys the maximum entropy principle, and thus logistic regression is sometimes called "maximum entropy modeling", and the resulting classifier the "maximum entropy classifier". The generalization of binomial logistic regression to multinomial logistic regression is sometimes called a softmax or exponential model. Maximum a priori (MAP) estimation with Gaussian priors is often referred to as "ridge regression"; with Laplace priors MAP estimation is known as the "lasso". MAP estimation with Gaussian, Laplace or Cauchy priors is known as parameter shrinkage. Gaussian and Laplace are forms of regularized regression, with the Gaussian version being regularized with the L_{2} norm (Euclidean distance, called the Frobenius norm for matrices of parameters) and the Laplace version being regularized with the L_{1} norm (taxicab distance or Manhattan metric); other Minkowski metrics may be used for shrinkage. Binary logistic regression is equivalent to a onelayer, singleoutput neural network with a logistic activation function trained under log loss. This is sometimes called classification with a single neuron. 
numInputDimensions()
returns the number of dimensions (features)
in the model. Because the model is wellbehaved under sparse
vectors, the dimensionality may be returned as Integer.MAX_VALUE
, a common default choice for sparse vectors.
A logistic regression model also fixes the number of output
categories. The method numOutcomes()
returns the number
of categories. These outcome categories will be represented as
integers from 0
to numOutcomes()1
inclusive.
A model is parameterized by a realvalued vector for every
category other than the last, each of which must be of the same
dimensionality as the model's input feature dimensionality. The
constructor LogisticRegression(Vector[])
takes an array of
Vector
objects, which may be dense or sparse, but must all
be of the same dimensionality.
The likelihood of a given output category k <
numOutcomes()
given an input vector x
of
dimensionality numFeatures()
is given by:
wherep(c  x, β) = exp(β_{k} * x) / Z(x) if c < numOutcomes()1 1 / Z(x) if c = numOutcomes()1
β_{k} * x
is vector dot (or inner)
product:
and where the normalizing denominator, called the partition function, is:β_{k} * x = Σ_{i < numDimensions()} β_{k,i} * x_{i}
Z(x) = 1 + Σ_{k < numOutcomes()1} exp(β_{k} * x)
(x,c)
and a prior, which
must be an instance of RegressionPrior
. The error function
is just the negative log likelihood and log prior:
whereErr(D,β) = ( log_{2} p(βσ^{2}) + Σ_{{(x,c') in D}} log_{2} p(c'x,β))
p(βσ_{2})
is the likelihood of the parameters
β
in the prior, and p(cx,β)
is
the probability of category c
given input x
and parameters β
.
The maximum a posteriori estimate is such that the gradient (vector of partial derivatives of parameters) is zero. If the data is not linearly separable, a maximum likelihood solution must exist. If the data is not linearly separable and none of the data dimensions is colinear, the solution will be unique. If there is an informative Cauchy, Gaussian or Laplace prior, there will be a unique MAP solution even in the face of linear separability or colinear dimensions. Proofs of solution exists require showing the matrix of second partial derivatives of the error with respect to pairs of parameters, is positive semidefinite; if it is positive definite, the error is strictly concave and the MAP solution is unique.
The gradient
for parameter vector β_{c}
for
outcome c < k1
is:
where the gradient of the priors are described in the class documentation forgrad(Err(D,β_{c})) = ∂Err(D,β) / ∂β_{c} = ∂( log p(βσ^{2})) / ∂β_{c} + ∂(  Σ_{{(x,c') in D}} log p(c'  x, β))
RegressionPrior
, and the
gradient of the likelihood function is:.
where the indicator function∂(Σ_{{(x,c') in D}} log p(c'  x, β)) / ∂β_{c} =  Σ_{{(x,c') in D}} ∂ log p(c'  x, β)) /∂β_{c} =  Σ_{{(x,c') in D}} x * (p(c'  x, β)  I(c = c'))
I(c=c')
is equal to 1 if
c=c'
and equal to 0 otherwise.
1
, which makes the
parameters β_{c,0}
intercepts. The priors
allow the intercept to be given an uninformative prior even if the
other dimensions have informative priors.
Variance normalization can be achieved by setting the variance prior parameter independently for each dimension.
i
, then in addition to the raw value
x_{i}
, anotehr feature j
may be
introduced with value x_{i}^{2}
.
Similarly, interaction terms are often added for features
x_{i}
and x_{j}
,
with a new feature x_{k}
being defined
with value x_{i}
x_{j}
.
The resulting model is linear in the derived features, but
will no longer be linear in the original features.
Stochastic Gradient Descent
This class estimates logistic regression models using stochastic
gradient descent (SGD). The SGD method runs through the data one
or more times, considering one training case at a time, adjusting
the parameters along some multiple of the contribution to the gradient
of the error for that case.
With informative priors, the search space
is strictly concave, and there will be a unique solution. In cases
of linear dependence between dimensions or in separable data,
maximum likelihood estimation may diverge.
The basic algorithm is:
β = 0;
for (epoch = 0; epoch < maxEpochs; ++epoch)
for training case (x,c') in D
for category c < numOutcomes1
β_{c} = learningRate(epoch) * grad(Err(x,c,c',β,σ^{2}))
if (epoch > minEpochs && converged)
return β
where we discuss the learning rate and convergence conditions
in the next section. The gradient of the error is described
above, and the gradient contribution of the prior and its
parameters σ
are described in the class
documentation for RegressionPrior
. Note that the error
gradient must be divided by the number of training cases to
get the incremental contribution of the prior gradient.
The actual algorithm uses a lazy form of updating the contribution
of the gradient of the prior. The result is an algorithm that
handles sparse input data touching only the nonzero dimensions of
inputs during parameter updates.
Training Data
The regression estimation method requires paired input vectors and
reference categories. References categories may be specified as
integers (from 0 to numOutcomes()1
) or weight vectors with
numOutcomes()
dimensions. All weights must be greater than 0.
When the values of a weight vector sum to 1.0, training is consistent with
an interpretation of the vector as representing a probabilistic assignment
of categories. For example, if W1+W2=1.0, then training on N instances with
weights W1 and W2 for categories C1 and C2 will produce similar results to
training on N*W1 instances with integer outcome C1, and N*W2 instances with
outcome C2.
When the weight values are positive integers, they act as oversampling
multiples. So training on an instance with positive integer weight N for category C
will produce similar results to training on N unweighted instances with integer
outcome C.
Learning Parameters
In addition to the model parameters (including priors) and training
data (input vectors and reference categories), the regression
estimation method also requires four parameters that control
search. The simplest search parameters are the minimum and maximum
epoch parameters, which control the number of epochs used for
optimzation.
The argument minImprovement
determines how much
improvement in training data and model log likelihood under the
current model is necessary to go onto the next epoch. This is
measured relatively, with the algorithm stopping when the current
epoch's error err
is relatively close to the previous
epoch's error, errLast
:
abs(err  errLast)/(abs(err) + abs(errLast)) < minImprovement
Setting this to a low value will lead to slow, but accurate
coefficient estimates.
Finally, the search parameters include an instance of
AnnealingSchedule
which impelements the learningRate(epoch)
method. See that method for concrete implementations, including
a standard inverse epoch annealing and exponential decay annealing.
Blocked Updates
The implementation of stochastic gradient descent used in this
class for fitting a logistic regression model calculates the
likelihoods for an entire block of examples at once without
changing the model parameters. The parameters are then updated for
the entire block at once.
The last block may be smaller than the others, but it will
be treated the same way. First its classifications are computed,
then the gradient updates are made, then the prior updates.
Larger block sizes tend to lead to more robust fitting, but
may be slower to converge in terms of number of epochs.
In fitting models with priors, large block sizes will cause
each epoch to run faster because the dense operation of adjusting
for priors is performed less frequently.
If the block size is set to the corpus size, gradient descent
reduces to conjugate gradient descent, although step sizes will
still be calculated with the learning rate, not by a line search
along the gradient direction.
Serialization and Compilation
For convenience, this class implements both the Serializable
and Compilable
interfaces. Serializing or compiling
a logistic regression model has the same effect. The model
read back in from its serialized state will be an instance of
this class, LogisticRegression
.
References
Logistic regression is discussed in most machine learning and
statistics textbooks. These three machine learning textbooks all
introduce some form of stochastic gradient descent and logistic
regression (often not together, and often under different names as
listed in the AKA section above):
 MacKay, David. 2003. Information Theory, Inference, and Learning Algorithms (includes free download links).
Cambridge University Press.
 Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2001.
Elements of Statistical Learning.
Springer.
 Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning.
Springer.
An introduction to traditional statistical modeling with logistic
regression may be found in:
 Gelman, Andrew and Jennnifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
A discussion of text classification using regression that evaluates
with respect to support vector machines (SVMs) and considers
informative Laplace and Gaussian priors varying by dimension (which
this class supports), see:
 Genkin, Alexander, David D. Lewis, and David Madigan. 2004.
LargeScale Bayesian Logistic Regression for Text Categorization.
Rutgers University Technical Report.
(alternate download).
 Since:
 LingPipe3.5
 Version:
 4.0.1
 Author:
 Bob Carpenter, Mike Ross
 See Also:
 Serialized Form


Constructor Summary
Constructors
Constructor and Description
LogisticRegression(Vector weightVector)
Construct a binomial logistic regression model with the
specified parameter vector.
LogisticRegression(Vector[] weightVectors)
Construct a multinomial logistic regression model with
the specified weight vectors.

Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type
Method and Description
double[]
classify(Vector x)
Returns an array of conditional probabilities indexed by
outcomes for the specified input vector.
void
classify(Vector x,
double[] ysHat)
Fills the specified array with the conditional probabilities
indexed by outcomes for the specified input vector.
void
compileTo(ObjectOutput out)
Compiles this model to the specified object output.
static LogisticRegression
estimate(Vector[] xs,
int[] cs,
RegressionPrior prior,
AnnealingSchedule annealingSchedule,
Reporter reporter,
double minImprovement,
int minEpochs,
int maxEpochs)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
static LogisticRegression
estimate(Vector[] xs,
int[] cs,
RegressionPrior prior,
int blockSize,
LogisticRegression hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegression> handler,
Reporter reporter)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
static LogisticRegression
estimate(Vector[] xs,
Vector[] cs,
RegressionPrior prior,
AnnealingSchedule annealingSchedule,
Reporter reporter,
double minImprovement,
int minEpochs,
int maxEpochs)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
static LogisticRegression
estimate(Vector[] xs,
Vector[] cs,
RegressionPrior prior,
int blockSize,
LogisticRegression hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegression> handler,
Reporter reporter)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
static double
log2Likelihood(Vector[] inputs,
int[] cats,
LogisticRegression regression)
Returns the log (base 2) likelihood of the specified inputs
with the specified categories using the specified regression
model.
static double
log2Likelihood(Vector[] inputs,
Vector[] cats,
LogisticRegression regression)
int
numInputDimensions()
Returns the dimensionality of inputs for this logistic
regression model.
int
numOutcomes()
Returns the number of outcomes for this logistic regression
model.
Vector[]
weightVectors()
Returns an array of views of the weight vectors used for this
regression model.


Constructor Detail

LogisticRegression
public LogisticRegression(Vector[] weightVectors)
Construct a multinomial logistic regression model with
the specified weight vectors. With k1
weight vectors, the result is a multinomial classifier
with k
outcomes.
The weight vectors are stored rather than copied, so
changes to them will affect this class.
See the class definition above for more information on
logistic regression.
 Parameters:
weightVectors
 Weight vectors definining this regression
model.
 Throws:
IllegalArgumentException
 If the array of weight vectors
does not have at least one element or if there are two weight
vectors with different numbers of dimensions.

LogisticRegression
public LogisticRegression(Vector weightVector)
Construct a binomial logistic regression model with the
specified parameter vector. See the class definition above
for more information on logistic regression.
The weight vector is stored rather than copied, so
changes to it will affect this class.
 Parameters:
weightVector
 The weights of features defining this
model.

Method Detail

numInputDimensions
public int numInputDimensions()
Returns the dimensionality of inputs for this logistic
regression model.
 Returns:
 The number of dimensions for this model.

numOutcomes
public int numOutcomes()
Returns the number of outcomes for this logistic regression
model.
 Returns:
 The number of outcomes for this model.

weightVectors
public Vector[] weightVectors()
Returns an array of views of the weight vectors used for this
regression model. The returned weight vectors are immutable
views of the underlying vectors used by this model, so will
change if the vectors making up this model change.
 Returns:
 An array of views of the weight vectors for this model.

classify
public double[] classify(Vector x)
Returns an array of conditional probabilities indexed by
outcomes for the specified input vector. The resulting array
has a value for index i
that is equal to the
probability of the outcome i
for the specified
input. The sum of the returned values will be 1.0 (modulo
arithmetic precision).
See the class definition above for more information on
how the conditional probabilities are computed.
 Parameters:
x
 The input vector.
 Returns:
 The array of conditional probabilities of
outcomes.
 Throws:
IllegalArgumentException
 If the specified vector is not
the same dimensionality as this logistic regression instance.

classify
public void classify(Vector x,
double[] ysHat)
Fills the specified array with the conditional probabilities
indexed by outcomes for the specified input vector.
The resulting array has a value for index i
that is equal to the probability of the outcome i
for the specified input. The sum of the returned values will
be 1.0 (modulo arithmetic precision).
See the class definition above for more information on
how the conditional probabilities are computed.
 Parameters:
x
 The input vector.
ysHat
 Array into which conditional probabilities are written.
 Throws:
IllegalArgumentException
 If the specified vector is not
the same dimensionality as this logistic regression instance.

compileTo
public void compileTo(ObjectOutput out)
throws IOException
Compiles this model to the specified object output. The
compiled model, when read back in, will remain an instance of
this class, LogisticRegression
.
Compilation does the same thing as serialization.
 Specified by:
compileTo
in interface Compilable
 Parameters:
out
 Object output to which this model is compiled.
 Throws:
IOException
 If there is an underlying I/O error during
serialization.

estimate
public static LogisticRegression estimate(Vector[] xs,
int[] cs,
RegressionPrior prior,
AnnealingSchedule annealingSchedule,
Reporter reporter,
double minImprovement,
int minEpochs,
int maxEpochs)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter. The block size defaults to the number of examples
divided by 50 (or 1 if the division results in 0).
See the class documentation above for more information on
logistic regression and the stochastic gradient descent algorithm
used to implement this method.
Reporting: Reports at the debug level provide
epochbyepoch feedback. Reports at the info level indicate
inputs and major milestones in the algorithm. Reports at the
fatal levels are for thrown exceptions.
 Parameters:
xs
 Input vectors indexed by training case.
cs
 Output categories indexed by training case.
prior
 The prior to be used for regression.
annealingSchedule
 Class to compute learning rate for each epoch.
minImprovement
 The minimum relative improvement in
log likelihood for the corpus to continue to another epoch.
minEpochs
 Minimum number of epochs.
maxEpochs
 Maximum number of epochs.
reporter
 Reporter to which progress reports are written, or
null
if no progress reports are needed.
 Throws:
IllegalArgumentException
 If the set of input vectors
does not contain at least one instance, if the number of output
categories isn't the same as the input categories, if two input
vectors have different dimensions, or if the prior has a
different number of dimensions than the instances.

estimate
public static LogisticRegression estimate(Vector[] xs,
Vector[] cs,
RegressionPrior prior,
AnnealingSchedule annealingSchedule,
Reporter reporter,
double minImprovement,
int minEpochs,
int maxEpochs)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter. The block size defaults to the number of examples
divided by 50 (or 1 if the division results in 0).
See the class documentation above for more information on
logistic regression and the stochastic gradient descent algorithm
used to implement this method.
Reporting: Reports at the debug level provide
epochbyepoch feedback. Reports at the info level indicate
inputs and major milestones in the algorithm. Reports at the
fatal levels are for thrown exceptions.
 Parameters:
xs
 Input vectors indexed by training case.
cs
 Output vectors representing probabilistic category assignments
indexed by training case.
prior
 The prior to be used for regression.
annealingSchedule
 Class to compute learning rate for each epoch.
minImprovement
 The minimum relative improvement in
log likelihood for the corpus to continue to another epoch.
minEpochs
 Minimum number of epochs.
maxEpochs
 Maximum number of epochs.
reporter
 Reporter to which progress reports are written, or
null
if no progress reports are needed.
 Throws:
IllegalArgumentException
 If the set of input vectors
does not contain at least one instance, if the number of output
categories isn't the same as the input categories, if two input
vectors have different dimensions, or if the prior has a
different number of dimensions than the instances.

estimate
public static LogisticRegression estimate(Vector[] xs,
int[] cs,
RegressionPrior prior,
int blockSize,
LogisticRegression hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegression> handler,
Reporter reporter)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
See the class documentation above for more information on
logistic regression and the stochastic gradient descent algorithm
used to implement this method.
Reporting: Reports at the debug level provide
epochbyepoch feedback. Reports at the info level indicate
inputs and major milestones in the algorithm. Reports at the
fatal levels are for thrown exceptions.
 Parameters:
xs
 Input vectors indexed by training case.
cs
 Output categories indexed by training case.
prior
 The prior to be used for regression.
blockSize
 Number of examples whose gradient is
computed before updating coefficients.
hotStart
 Logistic regression from which to retrieve
initial weights or null to use zero vectors.
annealingSchedule
 Class to compute learning rate for each epoch.
minImprovement
 The minimum relative improvement in
log likelihood for the corpus to continue to another epoch.
minEpochs
 Minimum number of epochs.
maxEpochs
 Maximum number of epochs.
handler
 Handler for intermediate regression results.
reporter
 Reporter to which progress reports are written, or
null
if no progress reports are needed.
 Throws:
IllegalArgumentException
 If the set of input vectors
does not contain at least one instance, if the number of output
categories isn't the same as the input categories, if two input
vectors have different dimensions, or if the prior has a
different number of dimensions than the instances.

estimate
public static LogisticRegression estimate(Vector[] xs,
Vector[] cs,
RegressionPrior prior,
int blockSize,
LogisticRegression hotStart,
AnnealingSchedule annealingSchedule,
double minImprovement,
int rollingAverageSize,
int minEpochs,
int maxEpochs,
ObjectHandler<LogisticRegression> handler,
Reporter reporter)
Estimate a logistic regression model from the specified input
data using the specified Gaussian prior, initial learning rate
and annealing rate, the minimum improvement per epoch, the
minimum and maximum number of estimation epochs, and a
reporter.
See the class documentation above for more information on
logistic regression and the stochastic gradient descent algorithm
used to implement this method.
Reporting: Reports at the debug level provide
epochbyepoch feedback. Reports at the info level indicate
inputs and major milestones in the algorithm. Reports at the
fatal levels are for thrown exceptions.
 Parameters:
xs
 Input vectors indexed by training case.
cs
 Output vectors representing probabilistic category assignments
indexed by training case.
prior
 The prior to be used for regression.
blockSize
 Number of examples whose gradient is
computed before updating coefficients.
hotStart
 Logistic regression from which to retrieve
initial weights or null to use zero vectors.
annealingSchedule
 Class to compute learning rate for each epoch.
minImprovement
 The minimum relative improvement in
log likelihood for the corpus to continue to another epoch.
minEpochs
 Minimum number of epochs.
maxEpochs
 Maximum number of epochs.
handler
 Handler for intermediate regression results.
reporter
 Reporter to which progress reports are written, or
null
if no progress reports are needed.
 Throws:
IllegalArgumentException
 If the set of input vectors
does not contain at least one instance, if the number of output
categories isn't the same as the input categories, if two input
vectors have different dimensions, or if the prior has a
different number of dimensions than the instances.

log2Likelihood
public static double log2Likelihood(Vector[] inputs,
int[] cats,
LogisticRegression regression)
Returns the log (base 2) likelihood of the specified inputs
with the specified categories using the specified regression
model.
 Parameters:
inputs
 Input vectors.
cats
 Categories for input vectors.
regression
 Model to use for computing likelihood.
 Throws:
IllegalArgumentException
 If the inputs and categories
are not the same length.

log2Likelihood
public static double log2Likelihood(Vector[] inputs,
Vector[] cats,
LogisticRegression regression)