public class JointClassification extends ConditionalClassification
JointClassification
is a conditional classification
derived from a joint probability assignment to each category and
the object being classified. The conditional probabilities are
computed from the joint probabilities, but an additional score may
be provided for ordering. These scores must be ordered in the same
way as the joint probabilities. For example, the language model
classifiers implement the score as an entropy rate to allow
between-document comparisons.
In addition to the score and conditional probability methods,
this interface adds a method to retrieve joint log (base 2)
probability by rank, jointLog2Probability(int)
.
The conditional probability estimate of the category given the input is derived from the joint probability of category and input:
P(category|input) = P(category,input) / P(input)
where the joint probability P(category,input)
is
determined by the joint probability estimate and the input
probability P(input)
is estimated by marginalization:
P(input)
= Σcategory
P(category,input)
Warning: The result of marginalization is the same as
that of Statistics.normalize(double[])
applied to the joint probabilities. The same warning carries over
here: if the largest joint probability is more than
252
times larger than the next
largest, the largest will round off to one and all others will
round off to zero due to underflow.
Constructor and Description |
---|
JointClassification(String[] categories,
double[] log2JointProbs)
Construct a joint classification with the specified parallel
arrays of categories and log (base 2) joint probabilities of
category and input object.
|
JointClassification(String[] categories,
double[] scores,
double[] log2JointProbs)
Construct a joint classification with the specified parallel
arrays of categories and log (base 2) joint probabilities of
category and input object.
|
Modifier and Type | Method and Description |
---|---|
static JointClassification |
create(String[] categories,
double[] logProbabilities)
Return a joint classification given the categories and log
probabilities.
|
double |
jointLog2Probability(int rank)
Returns the log (base 2) probability of the category at
the specified rank.
|
double |
score(int rank)
Returns the cross-entropy rate of the category and text at the
specified rank.
|
String |
toString()
Returns a string-based representation of this joint probability
ranked classification.
|
conditionalProbability, conditionalProbability, createLogProbs, createProbs
create, create
category, size
bestCategory
public JointClassification(String[] categories, double[] log2JointProbs)
Double.NEGATIVE_INFINITY
, which is a legal input
to this constructor.categories
- Array of categories.log2JointProbs
- Log (base 2) joint probabilities of
categories, in descending numerical order.IllegalArgumentException
- If any of the log joint
probabilities is not zero or negative, or if they are not
in descending order.public JointClassification(String[] categories, double[] scores, double[] log2JointProbs)
Double.NEGATIVE_INFINITY
, which is a legal input
to this constructor.categories
- Array of categories.scores
- Scores of categories, in descending numerical
order.log2JointProbs
- Log (base 2) joint probabilities of
categories, in descending numerical order.IllegalArgumentException
- If any of the log joint
probabilities is not zero or negative, or if they are not
in descending order.public double jointLog2Probability(int rank)
score(int)
.rank
- Rank of result.public double score(int rank)
The cross-entropy rate of the category and text is defined differently than the cross-entropy of the text. For the combination, we divide the log (base 2) probability of the text plus the log (base 2) probability of the category by the length of the text plus 1. This non-standard definition ensures that the cross-entropy ordering remains the same as the joint probability ordering.
score
in class ScoredClassification
rank
- Rank of result category.public String toString()
toString
in class ConditionalClassification
public static JointClassification create(String[] categories, double[] logProbabilities)
The log probabilities must be finite and non-positive. A collection of joint probabilities should not exceed 1.0, but there is no such check; the result is just normalized.
categories
- Array of categories.logProbabilities
- Parallel array of log probabilities.IllegalArgumentException
- If any of the log probabilities
is infinite, not a number, or positive, or if the arrays are not
of the same length.