public class JointClassification extends ConditionalClassification
JointClassification
is a conditional classification
derived from a joint probability assignment to each category and
the object being classified. The conditional probabilities are
computed from the joint probabilities, but an additional score may
be provided for ordering. These scores must be ordered in the same
way as the joint probabilities. For example, the language model
classifiers implement the score as an entropy rate to allow
betweendocument comparisons.
In addition to the score and conditional probability methods,
this interface adds a method to retrieve joint log (base 2)
probability by rank, jointLog2Probability(int)
.
The conditional probability estimate of the category given the input is derived from the joint probability of category and input:
P(categoryinput) = P(category,input) / P(input)
where the joint probability P(category,input)
is
determined by the joint probability estimate and the input
probability P(input)
is estimated by marginalization:
P(input)
= Σ_{category}
P(category,input)
Warning: The result of marginalization is the same as
that of Statistics.normalize(double[])
applied to the joint probabilities. The same warning carries over
here: if the largest joint probability is more than
2^{52}
times larger than the next
largest, the largest will round off to one and all others will
round off to zero due to underflow.
Constructor and Description 

JointClassification(String[] categories,
double[] log2JointProbs)
Construct a joint classification with the specified parallel
arrays of categories and log (base 2) joint probabilities of
category and input object.

JointClassification(String[] categories,
double[] scores,
double[] log2JointProbs)
Construct a joint classification with the specified parallel
arrays of categories and log (base 2) joint probabilities of
category and input object.

Modifier and Type  Method and Description 

static JointClassification 
create(String[] categories,
double[] logProbabilities)
Return a joint classification given the categories and log
probabilities.

double 
jointLog2Probability(int rank)
Returns the log (base 2) probability of the category at
the specified rank.

double 
score(int rank)
Returns the crossentropy rate of the category and text at the
specified rank.

String 
toString()
Returns a stringbased representation of this joint probability
ranked classification.

conditionalProbability, conditionalProbability, createLogProbs, createProbs
create, create
category, size
bestCategory
public JointClassification(String[] categories, double[] log2JointProbs)
Double.NEGATIVE_INFINITY
, which is a legal input
to this constructor.categories
 Array of categories.log2JointProbs
 Log (base 2) joint probabilities of
categories, in descending numerical order.IllegalArgumentException
 If any of the log joint
probabilities is not zero or negative, or if they are not
in descending order.public JointClassification(String[] categories, double[] scores, double[] log2JointProbs)
Double.NEGATIVE_INFINITY
, which is a legal input
to this constructor.categories
 Array of categories.scores
 Scores of categories, in descending numerical
order.log2JointProbs
 Log (base 2) joint probabilities of
categories, in descending numerical order.IllegalArgumentException
 If any of the log joint
probabilities is not zero or negative, or if they are not
in descending order.public double jointLog2Probability(int rank)
score(int)
.rank
 Rank of result.public double score(int rank)
The crossentropy rate of the category and text is defined differently than the crossentropy of the text. For the combination, we divide the log (base 2) probability of the text plus the log (base 2) probability of the category by the length of the text plus 1. This nonstandard definition ensures that the crossentropy ordering remains the same as the joint probability ordering.
score
in class ScoredClassification
rank
 Rank of result category.public String toString()
toString
in class ConditionalClassification
public static JointClassification create(String[] categories, double[] logProbabilities)
The log probabilities must be finite and nonpositive. A collection of joint probabilities should not exceed 1.0, but there is no such check; the result is just normalized.
categories
 Array of categories.logProbabilities
 Parallel array of log probabilities.IllegalArgumentException
 If any of the log probabilities
is infinite, not a number, or positive, or if the arrays are not
of the same length.