com.aliasi.stats

## Class ZipfDistribution

• All Implemented Interfaces:
DiscreteDistribution

public class ZipfDistribution
extends AbstractDiscreteDistribution
The ZipfDistribution class provides a finite distribution parameterized by a positive integer number of outcomes with outcome probability inversely proportional to the rank of the outcome (ordered by probablity). Many natural language phenomena such as unigram word probabilities and named-entity probabilities follow roughly a Zipf distribution.

The Zipf probability distribution Zipfn with n outcomes is defined by assigning a probability to the rank r outcome, for 1<=r<=n, by:

Zipfn(r) = (1/r)/Zn
where Zn is the normalizing factor for a Zipf distribution with n outcomes:
Zn = Σ1<=j<=n 1/j

The Zipf distribution class provides a method for returning the entropy of the Zipf distribution. It also provides a static method for returning a Zipf distribution's probabilities in rank order. This latter method is useful for comparing observed distributions to that expected from a Zipf distribution.

• Eric W. Weisstein. Zipf's Law. From MathWorld--A Wolfram Web Resource.
• Eric W. Weisstein. Statistical Rank. From MathWorld--A Wolfram Web Resource.
Since:
LingPipe2.0
Version:
2.0
Author:
Bob Carpenter
• ### Constructor Detail

• #### ZipfDistribution

public ZipfDistribution(int numOutcomes)
Construct a Constant Zipf distribution with the specified number of outcomes.
Parameters:
numOutcomes - Number of outcomes for the distribution.
Throws:
IllegalArgumentException - If the number of outcomes specified is not positive.
• ### Method Detail

• #### numOutcomes

public int numOutcomes()
Returns the number of non-zero outcomes for this Zipf distribution.
Returns:
The number of non-zero outcomes for this distributioni.
• #### probability

public double probability(long rank)
Returns the probability of the outcome at the specified rank. This method returns 0.0 for non-positive ranks or ranks greater than the number of ranks in this distribution.
Specified by:
probability in interface DiscreteDistribution
Specified by:
probability in class AbstractDiscreteDistribution
Parameters:
rank - Rank of outcome.
Returns:
The probability of the outcome at the specified rank.
• #### zipfDistribution

public static double[] zipfDistribution(int numOutcomes)
Returns the array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes. See the class documentation above for a definition of these probabilities. Note that the index of the outcome will be one less than its rank; for example, the rank 1 outcome's probability is at index 0, the rank 5 outcome's probabilty at index 4.
Parameters:
numOutcomes - Number of outcomes.
Returns:
The array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes.