|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.classify.ScoredPrecisionRecallEvaluation
public class ScoredPrecisionRecallEvaluation
A ScoredPrecisionRecallEvaluation provides an evaluation
of possible precision-recall operating points and other summary statistics
The single method addCase(boolean,double) is used to populate
the evaluation, with the first argument representing whether the response
was correct and the second the score that was assigned. Note that
only positive reference cases are considered as part of this evaluation.
By way of example, consider the following table of cases, all of which involve positive responses. The cases are in rank order, with their scores and whether they were correct listed.
Note that there are four positive reference cases (blue backgrounds) and six negative reference cases (clear backgrounds) in this diagram. By setting an acceptance threshold at the various scores, the precision, recall, and rejection recall values listed in the fourth through sixth columns are derived. For instance, after the rank 0 response, which is wrong, recall is
Rank Score Correct Rec Prec Rej Rec 0 -1.21 incorrect 0.00 NaN0.83 1 -1.27 correct 0.25 0.50 0.83 2 -1.39 incorrect 0.25 0.33 0.67 3 -1.47 correct 0.50 0.50 0.67 4 -1.60 correct 0.75 0.60 0.67 5 -1.65 incorrect 0.75 0.50 0.50 6 -1.79 incorrect 0.75 0.43 0.33 7 -1.80 incorrect 0.75 0.38 0.17 8 -2.01 correct 1.00 0.44 0.17 9 -3.70 incorrect 1.00 0.40 0.00
0/4 = 0.0, because we have retrieved none of
the four positive reference cases; similarly, rejection recall is
5/6 = 0.83 because 5/6 of the negative
reference cases have been rejected. After the rank 4 response,
recall is 3/4 and rejection recall is 2/6.
The pairs of precision/recall values form the basis for the
precision-recall curve returned by prCurve(boolean), with
the argument indicating whether to perform precision interpolation.
For the above graph:
prCurve(false) = { {0.25, 0.50},
{0.50, 0.50},
{0.75, 0.60},
{1.00, 0.44} }
The pairs of recall/rejection recall values form the basis for the
receiver operating characteristic (ROC) curve returned by rocCurve(boolean) with the boolean parameter again indicating
whether to perform precision interpolation. For the above graph,
the result is:
rocCurve(false) = { { 0.25, 0.83 },
{ 0.50, 0.67 },
{ 0.75, 0.67 },
{ 1.00, 0.17 } }
Note that for both curves, only the rows corresponding to the
correct responses are considered, which are highlighted in blue.
Precision interpolation removes any operating point for which
there is a dominant operating point in both dimensions.
For the precision-recall curve, the points (0.25,0.50)
and (0.50,0.50) are dominated in both dimensions by
(0.75,0.60) and so are dropped; the resulting curve is:
prCurve(true) = { {0.75, 0.60},
{1.00, 0.44} }
This is meant to be read
meant to be read as having a constant precision of 0.6 for all
recall values between 0 and 0.75 inclusive; thus the interpolation
increases values. For the ROC curve, only the three points
highlighted in yellow are left:
rocCurve(true) = { { 0.25, 0.83 },
{ 0.75, 0.67 },
{ 1.00, 0.17 } }
Note that the precision interpolated curves always provide strictly
decreasing precisions.
The area under the raw precision-recall and ROC curves, with or without interpolation, is computed by the following methods:
areaUnderPrCurve(false) = (0.25 - 0.00) * 0.50
+ (0.50 - 0.25) * 0.50
+ (0.75 - 0.50) * 0.60
+ (1.00 - 0.75) * 0.44
= 0.51
areaUnderPrCurve(true) = (0.75 - 0.00) * 0.60
+ (1.00 - 0.75) * 0.44
= 0.56
The ROC areas are computed similarly to yield:
areaUnderRocCurve(false) = 0.58
areaUnderRocCurve(true) = 0.58
Note that the precision-interpolated values are always higher.
For precision-recall curves, three additional summary statistics are available. The first provides an average over precision values over the operating points on the uninterpolated precision-recall curve.
averagePrecision() = (0.50 + 0.50 + 0.60 + 0.44)/4.00
= 0.51
The second merely returns the maximum Fβ
measure for an actual operating point:
maximumFMeasure() = maximumFMeasure(1) = 0.67
Note that this statistic provides a post-hoc optimal setting for
F-measure. Further note that it is based on actual operating
points, not interpolations between operating points. The final
statistic is the so-called precision-recall breakeven point (BEP).
This is computed in the standard way by using the interpolated
precision-recall curve. Because the two points of interest are
(0.0,0.6) and (0.75,0.6), the best point at which they are equal
is (0.6,0.6), and thus:
prBreakevenPoint() = 0.6
Note that this value will always be less than or equal to the
maximum F1-measure.
Given the precision-recall curve, it's possible to compute the
precision after any given number of results. The method
precisionAt(int) will return the precision after
the specified number of results. For instance, in the above
graph:
precisionAt(5)() = 0.6
precisionAt(10)() = 0.4
Typically results are reported for 5, 10 and 100 when available
(counting from 1, not 0). The other information-retrieval style
result we return is the reciprocal rank (RR), which is defined to
be 1/rank, again counting from 1, not 0. For
instance, for the graph above, the first correct answer is the
second, soRR is 1/2:
reciprocalRank()() = 0.5
| Constructor Summary | |
|---|---|
ScoredPrecisionRecallEvaluation()
Construct a scored precision-recall evaluation. |
|
| Method Summary | |
|---|---|
void |
addCase(boolean correct,
double score)
Add a case with the specified correctness and response score. |
double |
areaUnderPrCurve(boolean interpolate)
Returns the area under the recall-precision curve with interpolation as specified. |
double |
areaUnderRocCurve(boolean interpolate)
Returns the area under the receiver operating characteristic (ROC) curve. |
double |
averagePrecision()
Returns point-wise average precision of points on the uninterpolated precision-recall curve. |
double |
maximumFMeasure()
Returns the maximum F1-measure for an actual operating point on the uninterpolated precision-recall curve. |
double |
maximumFMeasure(double beta)
Returns the maximum Fβ-measure for an actual operating point on the uninterpolated precision-recall curve for a specified β. |
double |
prBreakevenPoint()
Returns the breakeven point (BEP) for precision and recall based on the interpolated precision. |
double[][] |
prCurve(boolean interpolate)
Returns the set of recall/precision operating points according to the scores of the cases. |
double |
precisionAt(int rank)
Returns the precision score achieved by returning the top scoring documents up to the specified rank. |
double |
reciprocalRank()
Returns the reciprocal rank (RR) for this evaluation. |
double[][] |
rocCurve(boolean interpolate)
Returns the receiver operating characteristic (ROC) curve for the cases ordered by score. |
String |
toString()
Returns a string-based representation of this scored precision recall evaluation. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public ScoredPrecisionRecallEvaluation()
| Method Detail |
|---|
public void addCase(boolean correct,
double score)
true if the reference was also
positive. The score is just the response score.
Warning: The scores should be sensibly comparable across cases.
correct - true if this case was correct.score - Score of response.public double[][] prCurve(boolean interpolate)
rocCurve(boolean) returns the recall (sensitivity)
versus rejection recall (specificity) operating points, which
take the number of true negative classifications into account.
Note that the recall values (the first component) are strictly
increasing, resulting in a well-defined function from recall
to precision.
The second operation derives so-called "interpolated precision" and is widely used for evaluating information retrieval systems. The interpolated precision of a given recall point is defined to be the maximum precision for that recall point and any higher recall point. This ensures that precision values are non-increasing with increased recall. For the example above, because 0.60 precision is found at 0.75 recall, the interpolated precision of all recall levels lower than 0.75 is 0.60. This method implements this interpolation by only returning points that are not dominated by other points that have both better precision and recall: In the diagram, these are the yellow highlighted precision points.
It is common to also see this graph completed with points (0,1) and (1,0), but this function does not include these limits. The one hundred percent precision implied by the first point is not necessarily achievable, whereas the second point will be no better than the last point in the return result.
Neither interpolated nor uninterpolated return values are guaranteed to be convex. Convex closure will skew results upward in an even more unrealistic direction, especially if the artificial completion point (0,1) is included.
interpolate - Set to true if the precisions
are interpolated through pruning dominated points.
public double[][] rocCurve(boolean interpolate)
TP/(TP+FN) and rejection recall (sensitivity)
is its negative dual TN/(TN+FP), where
TP is the true positive count, FP the
false positive, FN the false negative and
TN the true negative counts. Through sensitivity,
the ROC curve provides information about rejection.
The last column in the example in prCurve(boolean) provides the rejection recall rates
at each threshold. The resulting ROC curves for that example
are:
As with the recall-precision curve, the parameter determines
whether or not to "interpolate" the rejection recall
values. This is carried out as with the recall-precision curve
by only returning values which would not be interpolated. In
general, without interpolation, the same rows of the table are
used as for the recall-precision curve, namely those at the end
of a run of true positives. Interpolation may result in a
different set of recall points in the pruned answer set, as in
the example above.
Like the recall-precision curve method, this method does not insert artificial end ponits of (0,1) and (1,0) into the graph. As with the recall-precision curve, the final entry will have recall equal to one.
Neither interpolated nor uninterpolated return values are guaranteed to be convex. Convex closure will skew results upward in an even more unrealistic direction, especially if the artificial completion point (0,1) is included.
interpolate - If true, any point with both
precision and recall lower than another point is eliminated
from the returned precision-recall curve.
public double maximumFMeasure()
2*recall*precision/(recall+precision).
For the example in prCurve(boolean):
maximumFMeasure("foo") = 0.67
corresponding to recall=0.75 and precision=0.60.
public double maximumFMeasure(double beta)
2*recall*precision/(recall+precision).
For the example in prCurve(boolean):
maximumFMeasure("foo") = 0.67
corresponding to recall=0.75 and precision=0.60.
public double prBreakevenPoint()
For the example illustrated in prCurve(boolean),
the breakeven point is 0.60. This is because the interpolated
precision recall curve is flat from the implicit initial point
(0.00,0.60) to (0.75,0.60) and thus
the line between them has a breakeven point of x = y =
0.6.
As an interpolation (equal precision and recall) of a rounded up estimate (interpolated recall-precision curve), the breakeven point is not necessarily an achievable operating point. Note that the recall-precision breakeven point will always be smaller than the maximum F measure, which does correspond to an observed operating point, because the breakeven point always involves lowering the recall of the first point on the curve with recall greater than precision to match the precision.
This method will return 0.0 if the
precision-recall curve never crosses the diagonal.
public double averagePrecision()
prCurve(boolean) for a definition of the values on the curve.
This method implements the standard information retrieval definition, which only averages precision measurements from correct responses.
For the example provided in prCurve(boolean), the
average precision is the average of precision values for the
correct responses (highlighted lines):
Although the reasoning is different, the average precision returned is the same as the area under the uninterpolated recall-precision graph.
public double precisionAt(int rank)
Double.NaN is returned.
public double reciprocalRank()
Typically, the mean of the reciprocal ranks for a number of evaluations is reported.
public double areaUnderPrCurve(boolean interpolate)
For the example detailed in prCurve(boolean), the areas without and with
interpolation are:
Interpolation will always result in an equal or greater area.
Note that the uninterpolated area under the recall-precision curve is the same as the average precision value.
interpolate - Set to true to interpolate
the precision values.
public double areaUnderRocCurve(boolean interpolate)
interpolate - Set to true to interpolate
the rejection recall values.
public String toString()
toString in class Object
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||