public class MultinomialDistribution extends Object
MultinomialDistribution
results from drawing a fixed
number of samples from a multivariate distribution. Thus the
probability distribution log2Probability(int[])
is over an
array of counts for the dimensions of the underlying multivariate
distribution. This class also contains a static method log2MultinomialCoefficient(int[])
to compute multinomial coefficients.
The method chiSquared(int[])
returns the chisquared
statistic for a sample of outcome counts represented by an array of
integers. The number of degrees of freedom is one less than the
number of dimensions.
As of LingPipe 3.2.0, the dependency on Jakarta Commons Math was removed. As a result, we removed the two methods that computed pvalues. Here's their implementation in case you need the functionality (you may need to increas the text size):
import org.apache.commons.math.MathException; import org.apache.commons.math.distribution.ChiSquaredDistribution; import org.apache.commons.math.distribution.ChiSquaredDistributionImpl; /** * Returns the pvalue for the chisquared statistic on the specified * sample counts. ... double pValue(int[] sampleCounts) throws MathException { ChiSquaredDistribution chiSq = new ChiSquaredDistributionImpl(numDimensions()1); double c = chiSquared(sampleCounts); return chiSq.cumulativeProbability(c); }
For more information, see:
Constructor and Description 

MultinomialDistribution(MultivariateDistribution distribution)
Construct a multinomial distribution based on the specified
multivariate distribution.

Modifier and Type  Method and Description 

MultivariateDistribution 
basisDistribution()
Returns the multivariate distribution that forms the basis of
this multinomial distribution.

double 
chiSquared(int[] sampleCounts)
Returns the chisquared statistic for rejecting the null
hypothesis that the specified samples were generated by this
distribution.

static double 
log2MultinomialCoefficient(int[] sampleCounts)
Returns the log (base 2) multinomial coefficient for the
specified counts.

double 
log2Probability(int[] sampleCounts)
Returns the log (base 2) probability of the distribution of
outcomes specified in the argument.

int 
numDimensions()
Returns the number of dimensions in this multinomial.

public MultinomialDistribution(MultivariateDistribution distribution)
distribution
 Underlying multivariate distribution
defining the constructed multinomial.public double log2Probability(int[] sampleCounts)
The definition of the probability value for multinomials is:
P(sampleCounts)
= multinomialCoefficient(sampleCounts)
* Π_{i}
P(i)^{sampleCounts[i]}
where the multinomial coefficient is as defined in the method documentation
for log2MultinomialCoefficient(int[])
. Taking logarithms yields:
log_{2} P(sampleCounts)
=
log_{2} multinomialCoefficient(sampleCounts)
+
Σ_{i}
sampleCounts[i] * log_{2} P(i)
Note that if the multivariate probability is zero for an
outcome with a nonzero count, the result will be Double.NEGATIVE_INFINITY
.sampleCounts
 Array of counts for outcomes.IllegalArgumentException
 If the number of outcome
counts is not the same as the number of dimensions of this multinomial.public double chiSquared(int[] sampleCounts)
The definition for the chisquare value is the sum of square differences between sample counts and expected counts, normalized by expected count:
χ^{2}(sampleCounts)
= Σ_{i}
(sampleCounts[i]  expectedCount(i))^{2}
/ expectedCount(i)
where the expected counts are computed based on the underlying
multivariate distribution and the total sample count:
expectedCount(i)
= probability(i) * totalCount
where totalCount
is the sum of all of the sample
counts.
Note that the chisquared test is a large sample test. For
accurate results, each expected count should be at least five; in
symbols, expectedCount(i) >= 5
for all i
.
sampleCounts
 Array of sample counts.IllegalArgumentException
 If the number of outcome
counts is not the same as the number of dimensions of this
multinomial.public int numDimensions()
public MultivariateDistribution basisDistribution()
public static double log2MultinomialCoefficient(int[] sampleCounts)
multinomialCoefficient(sampleCounts)
= totalCount! / ( Π_{i} sampleCounts[i]! )
Taking logarithms produces:
log_{2} multinomialCoefficient(sampleCounts)
= log_{2} totalCount!
 Σ_{i} log_{2} sampleCounts[i]!
The multinomial coefficient is often written using a notation
similar to that used for the factorial as
(sampleCounts[0],...,sampleCounts[n1])!
.sampleCounts
 Array of outcome counts.