com.aliasi.stats

## Class LogisticRegression

• All Implemented Interfaces:
Compilable, Serializable

```public class LogisticRegression
extends Object
implements Compilable, Serializable```
A `LogisticRegression` instance is a multi-class vector classifier model generating conditional probability estimates of categories. This class also provides static factory methods for estimating multinomial regression models using stochastic gradient descent (SGD) to find maximum likelihood or maximum a posteriori (MAP) estimates with Laplace, Gaussian, Cauchy priors on coefficients.

The classification package contains a class `LogisticRegressionClassifier` which adapts this class's models and estimators to act as generic classifiers given an instance of `FeatureExtractor`.

 Also Known As (AKA) Multinomial logistic regression is also known as polytomous, polychotomous, or multi-class logistic regression, or just multilogit regression. Binary logistic regression is an instance of a generalized linear model (GLM) with the logit link function. The logit function is the inverse of the logistic function, and the logistic function is sometimes called the sigmoid function or the s-curve. Logistic regression estimation obeys the maximum entropy principle, and thus logistic regression is sometimes called "maximum entropy modeling", and the resulting classifier the "maximum entropy classifier". The generalization of binomial logistic regression to multinomial logistic regression is sometimes called a softmax or exponential model. Maximum a priori (MAP) estimation with Gaussian priors is often referred to as "ridge regression"; with Laplace priors MAP estimation is known as the "lasso". MAP estimation with Gaussian, Laplace or Cauchy priors is known as parameter shrinkage. Gaussian and Laplace are forms of regularized regression, with the Gaussian version being regularized with the L2 norm (Euclidean distance, called the Frobenius norm for matrices of parameters) and the Laplace version being regularized with the L1 norm (taxicab distance or Manhattan metric); other Minkowski metrics may be used for shrinkage. Binary logistic regression is equivalent to a one-layer, single-output neural network with a logistic activation function trained under log loss. This is sometimes called classification with a single neuron.

#### Model Parameters

A logistic regression model is a discriminitive classifier for vectors of fixed dimensionality. The dimensions are often referred to as "features". The method `numInputDimensions()` returns the number of dimensions (features) in the model. Because the model is well-behaved under sparse vectors, the dimensionality may be returned as `Integer.MAX_VALUE`, a common default choice for sparse vectors.

A logistic regression model also fixes the number of output categories. The method `numOutcomes()` returns the number of categories. These outcome categories will be represented as integers from `0` to `numOutcomes()-1` inclusive.

A model is parameterized by a real-valued vector for every category other than the last, each of which must be of the same dimensionality as the model's input feature dimensionality. The constructor `LogisticRegression(Vector[])` takes an array of `Vector` objects, which may be dense or sparse, but must all be of the same dimensionality.

#### Likelihood

The likelihood of a given output category ```k < numOutcomes()``` given an input vector `x` of dimensionality `numFeatures()` is given by:

``` p(c | x, β) = exp(βk * x)  / Z(x)   if c < numOutcomes()-1

1 / Z(x)              if c = numOutcomes()-1```
where `βk * x` is vector dot (or inner) product:
` βk * x = Σi < numDimensions() βk,i * xi`
and where the normalizing denominator, called the partition function, is:
` Z(x) = 1 + Σk < numOutcomes()-1 exp(βk * x)`

#### Error and Gradient

This class computes maximum a posteriori parameter values given a sequence of training pairs `(x,c)` and a prior, which must be an instance of `RegressionPrior`. The error function is just the negative log likelihood and log prior:
` Err(D,β) = -( log2 p(β|σ2) + Σ{(x,c') in D} log2 p(c'|x,β))`
where `p(β|σ2)` is the likelihood of the parameters `β` in the prior, and `p(c|x,β)` is the probability of category `c` given input `x` and parameters `β`.

The maximum a posteriori estimate is such that the gradient (vector of partial derivatives of parameters) is zero. If the data is not linearly separable, a maximum likelihood solution must exist. If the data is not linearly separable and none of the data dimensions is colinear, the solution will be unique. If there is an informative Cauchy, Gaussian or Laplace prior, there will be a unique MAP solution even in the face of linear separability or colinear dimensions. Proofs of solution exists require showing the matrix of second partial derivatives of the error with respect to pairs of parameters, is positive semi-definite; if it is positive definite, the error is strictly concave and the MAP solution is unique.

The gradient for parameter vector `βc` for outcome `c < k-1` is:

``` grad(Err(D,βc))
= ∂Err(D,β) / ∂βc
= ∂(- log p(β|σ2)) / ∂βc
+ ∂( - Σ{(x,c') in D} log p(c' | x, β))```
where the gradient of the priors are described in the class documentation for `RegressionPrior`, and the gradient of the likelihood function is:.
``` ∂(-Σ{(x,c') in D} log p(c' | x, β)) / ∂βc
=  - Σ{(x,c') in D} ∂ log p(c' | x, β))  /∂βc
=  - Σ{(x,c') in D} x * (p(c' | x, β) - I(c = c'))```
where the indicator function `I(c=c')` is equal to 1 if `c=c'` and equal to 0 otherwise.

#### Intercept Term

It is conventional to assume that inputs have their first dimension reserved for the constant `1`, which makes the parameters `βc,0` intercepts. The priors allow the intercept to be given an uninformative prior even if the other dimensions have informative priors.

#### Feature Normalization

It is also common to convert inputs to z-scores in logistic regression. The z-score is computed given the mean and deviation of each dimension. The problem with centering (subtracting the mean from each value) is that it destroys sparsity. We recommend not centering and using an intercept term with an uninformative prior.

Variance normalization can be achieved by setting the variance prior parameter independently for each dimension.

#### Non-Linear and Interaction Features

It is common in logistic regression to include derived features which represent non-linear combinations of other input features. Typically, this is done through multiplication. For instance, if the output is a quadratic function of an input dimension `i`, then in addition to the raw value `xi`, anotehr feature `j` may be introduced with value `xi2`.

Similarly, interaction terms are often added for features `xi` and `xj`, with a new feature `xk` being defined with value ```xi xj. ```

```The resulting model is linear in the derived features, but will no longer be linear in the original features. ```

#### `Stochastic Gradient Descent`

``` This class estimates logistic regression models using stochastic gradient descent (SGD). The SGD method runs through the data one or more times, considering one training case at a time, adjusting the parameters along some multiple of the contribution to the gradient of the error for that case. With informative priors, the search space is strictly concave, and there will be a unique solution. In cases of linear dependence between dimensions or in separable data, maximum likelihood estimation may diverge. The basic algorithm is: β = 0; for (epoch = 0; epoch < maxEpochs; ++epoch) for training case (x,c') in D for category c < numOutcomes-1 βc -= learningRate(epoch) * grad(Err(x,c,c',β,σ2)) if (epoch > minEpochs && converged) return β where we discuss the learning rate and convergence conditions in the next section. The gradient of the error is described above, and the gradient contribution of the prior and its parameters σ are described in the class documentation for RegressionPrior. Note that the error gradient must be divided by the number of training cases to get the incremental contribution of the prior gradient. The actual algorithm uses a lazy form of updating the contribution of the gradient of the prior. The result is an algorithm that handles sparse input data touching only the non-zero dimensions of inputs during parameter updates. Training Data The regression estimation method requires paired input vectors and reference categories. References categories may be specified as integers (from 0 to numOutcomes()-1) or weight vectors with numOutcomes() dimensions. All weights must be greater than 0. When the values of a weight vector sum to 1.0, training is consistent with an interpretation of the vector as representing a probabilistic assignment of categories. For example, if W1+W2=1.0, then training on N instances with weights W1 and W2 for categories C1 and C2 will produce similar results to training on N*W1 instances with integer outcome C1, and N*W2 instances with outcome C2. When the weight values are positive integers, they act as over-sampling multiples. So training on an instance with positive integer weight N for category C will produce similar results to training on N unweighted instances with integer outcome C. Learning Parameters In addition to the model parameters (including priors) and training data (input vectors and reference categories), the regression estimation method also requires four parameters that control search. The simplest search parameters are the minimum and maximum epoch parameters, which control the number of epochs used for optimzation. The argument minImprovement determines how much improvement in training data and model log likelihood under the current model is necessary to go onto the next epoch. This is measured relatively, with the algorithm stopping when the current epoch's error err is relatively close to the previous epoch's error, errLast: abs(err - errLast)/(abs(err) + abs(errLast)) < minImprovement Setting this to a low value will lead to slow, but accurate coefficient estimates. Finally, the search parameters include an instance of AnnealingSchedule which impelements the learningRate(epoch) method. See that method for concrete implementations, including a standard inverse epoch annealing and exponential decay annealing. Blocked Updates The implementation of stochastic gradient descent used in this class for fitting a logistic regression model calculates the likelihoods for an entire block of examples at once without changing the model parameters. The parameters are then updated for the entire block at once. The last block may be smaller than the others, but it will be treated the same way. First its classifications are computed, then the gradient updates are made, then the prior updates. Larger block sizes tend to lead to more robust fitting, but may be slower to converge in terms of number of epochs. In fitting models with priors, large block sizes will cause each epoch to run faster because the dense operation of adjusting for priors is performed less frequently. If the block size is set to the corpus size, gradient descent reduces to conjugate gradient descent, although step sizes will still be calculated with the learning rate, not by a line search along the gradient direction. Serialization and Compilation For convenience, this class implements both the Serializable and Compilable interfaces. Serializing or compiling a logistic regression model has the same effect. The model read back in from its serialized state will be an instance of this class, LogisticRegression. References Logistic regression is discussed in most machine learning and statistics textbooks. These three machine learning textbooks all introduce some form of stochastic gradient descent and logistic regression (often not together, and often under different names as listed in the AKA section above): MacKay, David. 2003. Information Theory, Inference, and Learning Algorithms (includes free download links). Cambridge University Press. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2001. Elements of Statistical Learning. Springer. Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Springer. An introduction to traditional statistical modeling with logistic regression may be found in: Gelman, Andrew and Jennnifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. A discussion of text classification using regression that evaluates with respect to support vector machines (SVMs) and considers informative Laplace and Gaussian priors varying by dimension (which this class supports), see: Genkin, Alexander, David D. Lewis, and David Madigan. 2004. Large-Scale Bayesian Logistic Regression for Text Categorization. Rutgers University Technical Report. (alternate download). ```
``` Since: LingPipe3.5 Version: 4.0.1 Author: Bob Carpenter, Mike Ross See Also: Serialized Form ```
• ``` ```
``` ```
``` Constructor Summary Constructors  Constructor and Description LogisticRegression(Vector weightVector) Construct a binomial logistic regression model with the specified parameter vector. LogisticRegression(Vector[] weightVectors) Construct a multinomial logistic regression model with the specified weight vectors. Method Summary All Methods Static Methods Instance Methods Concrete Methods  Modifier and Type Method and Description double[] classify(Vector x) Returns an array of conditional probabilities indexed by outcomes for the specified input vector. void classify(Vector x, double[] ysHat) Fills the specified array with the conditional probabilities indexed by outcomes for the specified input vector. void compileTo(ObjectOutput out) Compiles this model to the specified object output. static LogisticRegression estimate(Vector[] xs, int[] cs, RegressionPrior prior, AnnealingSchedule annealingSchedule, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. static LogisticRegression estimate(Vector[] xs, int[] cs, RegressionPrior prior, int blockSize, LogisticRegression hotStart, AnnealingSchedule annealingSchedule, double minImprovement, int rollingAverageSize, int minEpochs, int maxEpochs, ObjectHandler<LogisticRegression> handler, Reporter reporter) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. static LogisticRegression estimate(Vector[] xs, Vector[] cs, RegressionPrior prior, AnnealingSchedule annealingSchedule, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. static LogisticRegression estimate(Vector[] xs, Vector[] cs, RegressionPrior prior, int blockSize, LogisticRegression hotStart, AnnealingSchedule annealingSchedule, double minImprovement, int rollingAverageSize, int minEpochs, int maxEpochs, ObjectHandler<LogisticRegression> handler, Reporter reporter) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. static double log2Likelihood(Vector[] inputs, int[] cats, LogisticRegression regression) Returns the log (base 2) likelihood of the specified inputs with the specified categories using the specified regression model. static double log2Likelihood(Vector[] inputs, Vector[] cats, LogisticRegression regression)  int numInputDimensions() Returns the dimensionality of inputs for this logistic regression model. int numOutcomes() Returns the number of outcomes for this logistic regression model. Vector[] weightVectors() Returns an array of views of the weight vectors used for this regression model. Methods inherited from class java.lang.Object clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait Constructor Detail LogisticRegression public LogisticRegression(Vector[] weightVectors) Construct a multinomial logistic regression model with the specified weight vectors. With k-1 weight vectors, the result is a multinomial classifier with k outcomes. The weight vectors are stored rather than copied, so changes to them will affect this class. See the class definition above for more information on logistic regression. Parameters: weightVectors - Weight vectors definining this regression model. Throws: IllegalArgumentException - If the array of weight vectors does not have at least one element or if there are two weight vectors with different numbers of dimensions. LogisticRegression public LogisticRegression(Vector weightVector) Construct a binomial logistic regression model with the specified parameter vector. See the class definition above for more information on logistic regression. The weight vector is stored rather than copied, so changes to it will affect this class. Parameters: weightVector - The weights of features defining this model. Method Detail numInputDimensions public int numInputDimensions() Returns the dimensionality of inputs for this logistic regression model. Returns: The number of dimensions for this model. numOutcomes public int numOutcomes() Returns the number of outcomes for this logistic regression model. Returns: The number of outcomes for this model. weightVectors public Vector[] weightVectors() Returns an array of views of the weight vectors used for this regression model. The returned weight vectors are immutable views of the underlying vectors used by this model, so will change if the vectors making up this model change. Returns: An array of views of the weight vectors for this model. classify public double[] classify(Vector x) Returns an array of conditional probabilities indexed by outcomes for the specified input vector. The resulting array has a value for index i that is equal to the probability of the outcome i for the specified input. The sum of the returned values will be 1.0 (modulo arithmetic precision). See the class definition above for more information on how the conditional probabilities are computed. Parameters: x - The input vector. Returns: The array of conditional probabilities of outcomes. Throws: IllegalArgumentException - If the specified vector is not the same dimensionality as this logistic regression instance. classify public void classify(Vector x, double[] ysHat) Fills the specified array with the conditional probabilities indexed by outcomes for the specified input vector. The resulting array has a value for index i that is equal to the probability of the outcome i for the specified input. The sum of the returned values will be 1.0 (modulo arithmetic precision). See the class definition above for more information on how the conditional probabilities are computed. Parameters: x - The input vector. ysHat - Array into which conditional probabilities are written. Throws: IllegalArgumentException - If the specified vector is not the same dimensionality as this logistic regression instance. compileTo public void compileTo(ObjectOutput out) throws IOException Compiles this model to the specified object output. The compiled model, when read back in, will remain an instance of this class, LogisticRegression. Compilation does the same thing as serialization. Specified by: compileTo in interface Compilable Parameters: out - Object output to which this model is compiled. Throws: IOException - If there is an underlying I/O error during serialization. estimate public static LogisticRegression estimate(Vector[] xs, int[] cs, RegressionPrior prior, AnnealingSchedule annealingSchedule, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. The block size defaults to the number of examples divided by 50 (or 1 if the division results in 0). See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method. Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions. Parameters: xs - Input vectors indexed by training case. cs - Output categories indexed by training case. prior - The prior to be used for regression. annealingSchedule - Class to compute learning rate for each epoch. minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch. minEpochs - Minimum number of epochs. maxEpochs - Maximum number of epochs. reporter - Reporter to which progress reports are written, or null if no progress reports are needed. Throws: IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances. estimate public static LogisticRegression estimate(Vector[] xs, Vector[] cs, RegressionPrior prior, AnnealingSchedule annealingSchedule, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. The block size defaults to the number of examples divided by 50 (or 1 if the division results in 0). See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method. Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions. Parameters: xs - Input vectors indexed by training case. cs - Output vectors representing probabilistic category assignments indexed by training case. prior - The prior to be used for regression. annealingSchedule - Class to compute learning rate for each epoch. minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch. minEpochs - Minimum number of epochs. maxEpochs - Maximum number of epochs. reporter - Reporter to which progress reports are written, or null if no progress reports are needed. Throws: IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances. estimate public static LogisticRegression estimate(Vector[] xs, int[] cs, RegressionPrior prior, int blockSize, LogisticRegression hotStart, AnnealingSchedule annealingSchedule, double minImprovement, int rollingAverageSize, int minEpochs, int maxEpochs, ObjectHandler<LogisticRegression> handler, Reporter reporter) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method. Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions. Parameters: xs - Input vectors indexed by training case. cs - Output categories indexed by training case. prior - The prior to be used for regression. blockSize - Number of examples whose gradient is computed before updating coefficients. hotStart - Logistic regression from which to retrieve initial weights or null to use zero vectors. annealingSchedule - Class to compute learning rate for each epoch. minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch. minEpochs - Minimum number of epochs. maxEpochs - Maximum number of epochs. handler - Handler for intermediate regression results. reporter - Reporter to which progress reports are written, or null if no progress reports are needed. Throws: IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances. estimate public static LogisticRegression estimate(Vector[] xs, Vector[] cs, RegressionPrior prior, int blockSize, LogisticRegression hotStart, AnnealingSchedule annealingSchedule, double minImprovement, int rollingAverageSize, int minEpochs, int maxEpochs, ObjectHandler<LogisticRegression> handler, Reporter reporter) Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method. Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions. Parameters: xs - Input vectors indexed by training case. cs - Output vectors representing probabilistic category assignments indexed by training case. prior - The prior to be used for regression. blockSize - Number of examples whose gradient is computed before updating coefficients. hotStart - Logistic regression from which to retrieve initial weights or null to use zero vectors. annealingSchedule - Class to compute learning rate for each epoch. minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch. minEpochs - Minimum number of epochs. maxEpochs - Maximum number of epochs. handler - Handler for intermediate regression results. reporter - Reporter to which progress reports are written, or null if no progress reports are needed. Throws: IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances. log2Likelihood public static double log2Likelihood(Vector[] inputs, int[] cats, LogisticRegression regression) Returns the log (base 2) likelihood of the specified inputs with the specified categories using the specified regression model. Parameters: inputs - Input vectors. cats - Categories for input vectors. regression - Model to use for computing likelihood. Throws: IllegalArgumentException - If the inputs and categories are not the same length. log2Likelihood public static double log2Likelihood(Vector[] inputs, Vector[] cats, LogisticRegression regression) ```
``` Skip navigation links Overview Package Class Tree Deprecated Index Help Prev Class Next Class Frames No Frames All Classes <!-- allClassesLink = document.getElementById("allclasses_navbar_bottom"); if(window==top) { allClassesLink.style.display = "block"; } else { allClassesLink.style.display = "none"; } //--> Summary:  Nested |  Field |  Constr |  Method Detail:  Field |  Constr |  Method ```