public class ScoredPrecisionRecallEvaluation extends Object
ScoredPrecisionRecallEvaluation
provides an
evaluation based on the precisionrecall operating points and
sensitivityspecificity operating points. The unscored
precisionrecall evaluation class is PrecisionRecallEvaluation
.
There is a single noarg constructor ScoredPrecisionRecallEvaluation()
.
The method addCase(boolean,double)
is used to populate
the evaluation, with the first argument representing whether the
response was correct and the second the score that was assigned.
If there are positive reference cases that are not added through
addCase()
, the total number of such cases should be added
using the method addMisses(int)
. This method effectively
increments the number of reference positive cases used to compute
recall values.
If there are negative reference cases that are not dealt with
through addCase()
, the method addNegativeMisses(int)
should be called with the total number of
such cases as an argument. This method increments the number of
reference engative cases used to compute specificity values.
By way of example, consider the following table of cases, all of which involve positive responses. The cases are in rank order, but may be added in any order.
The first line, which is separated, indicates the values before any results have been returned. There's no score corresponding to this operating point, and given that it doesn't correspond to a result, correctness is not applicable. It has zero recall, one specificity, and one precision (letting zero divided by zero be one here).
Rank Score Correct TP TN FP FN Rec Prec Spec F Meas (1) n/a n/a 0 6 0 5 0.00 1.00 1.00 0.00 0 1.21 no 0 5 1 5 0.00 0.00 0.83 0.00 1 1.27 yes 1 5 1 4 0.20 0.50 0.83 0.29 2 1.39 no 1 4 2 4 0.20 0.33 0.67 0.25 3 1.47 yes 2 4 2 3 0.40 0.50 0.67 0.44 4 1.60 yes 3 4 2 2 0.60 0.60 0.67 0.60 5 1.65 no 3 3 3 2 0.60 0.50 0.50 0.55 6 1.79 no 3 2 4 2 0.60 0.43 0.33 0.50 7 1.80 no 3 1 5 2 0.60 0.38 0.17 0.47 8 2.01 yes 4 1 5 1 0.80 0.44 0.17 0.53 9 3.70 no 4 0 6 1 0.80 0.40 0.00 0.53 ? n/a yes 5 0 6 0 1.00 0.00 0.00 0.00
The next lines, listed as ranks 0 to 9, correspond to calls to
addCase()
with the specified score and correctness. For
each of these lines, we list the corresponding number of true
positives (TP), true negatives (TN), false positives (FP), and
false negatives (FN). These are followed by recall, precision and
specificity (aka rejection recall). See the class documentation
for PrecisionRecallEvaluation
for definitions of these
values in terms of the TP, TN, FP, and FN counts.
There are five positive reference cases (blue backgrounds) and six negative reference cases (clear backgrounds) in this diagram. The yellow precision values and orange specificity values are used for interpolated curves.
The pairs of precision/recall values form the basis for the
precisionrecall curve returned by prCurve(boolean)
, with
the argument indicating whether to perform precision interpolation.
For the above graph, the uninterpolated precisionrecall curve is:
Typically, a form of interpolation is performed that sets the precision for a given recall value to the maximum of the precision at the curent or greater recall value. This pushes the yellow precision values up the graph. At the same time, we only return values that correspond to jumps in recall, corresponding to ranks at which true positives were returned. For the example above, the result isprCurve(false) = { { 0.00, 1.00 }, { 0.20, 0.50 }, { 0.20, 0.33 }, { 0.40, 0.50 }, { 0.60, 0.60 }, { 0.60, 0.50 }, { 0.60, 0.43 }, { 0.60, 0.38 }, { 0.60, 0.38 }, { 0.80, 0.44 }, { 0.80, 0.40 }, { 1.00, 0.00 } }
For convenience, the evaluation always adds the two limit points, one with precision 0 and recall 1, and one with precision 1 and recall 0. These operating points are always achievable, the first by returning every possible answer, and the second by returning no answers.prCurve(true) = { { 0.00, 1.00 }, { 0.20, 0.60 }, { 0.40, 0.60 }, { 0.60, 0.60 }, { 0.80, 0.44 }, { 1.00, 0.00 } }
The ROC curve is returned by the method rocCurve(boolean)
with the boolean parameter again indicating
whether to perform precision interpolation. For the above graph,
the result is:
Interpolation works exactly the same way as for the precisionrecall curves, but based on specificity rather than precsion.rocCurve(false) = { { 1  1.00, 0.00 }, { 1  0.83, 0.00 }, { 1  0.83, 0.20 }, { 1  0.67, 0.20 }, { 1  0.67, 0.40 }, { 1  0.67, 0.60 }, { 1  0.50, 0.60 }, { 1  0.33, 0.60 }, { 1  0.17, 0.60 }, { 1  0.17, 0.80 }, { 1  0.00, 0.80 }, { 1  0.00, 1.00 } }
rocCurve(true) = { { 1  1.00, 0.00 }, { 1  0.83, 0.20 }, { 1  0.67, 0.60 }, { 1  0.50, 0.60 }, { 1  0.33, 0.60 }, { 1  0.17, 0.80 }, { 1  0.00, 1.00 } }
In some information extraction or retrieval tasks, a system might only return a fixed number of examples to a user. To evaluate the result of such truncated result sets, it is common to report the precision after N returned results. The counting starts from one rather than zero for returned results, but we fill in a limiting value of 1.0 for precision at 0. In our running example, we have
precisionAt(0) = 1.0 precisionAt(1) = 0.0 precisionAt(5) = 0.6 precisionAt(10) = 0.4 precisionAt(20) = 0.2 precisionAt(100) = 0.04The return value for a rank greater than the number of cases added will be calculated assuming all other results are errors.
1/M
, where M
is the
rank (counting from 1) of the first true positive return. In our
running example, the first result is a false positive and the
second a true positive, so reciprocal rank is
reciprocalRank()() = 0.5Note that this measure emphasizes differences in early ranks much more than later ones. For instance, the reciprocal rank for a system returning a correct result first is 1/1, but for one returning it second, it's 1/2, and for one returning the first true positive at rank 10, it's 1/10. The difference between rank 1 and 2 is greater than that between 2 and 10.
For the running example, R precision is
R precision will always be at a point where precision equals recall. It is also known as the precisionrecall breakeven point (BEP), and for convenience, there is a method of that name,rPrecision() = 0.6
prBreakevenPoint() = rPrecision() = 0.6
PrecisionRecallEvaluation.fMeasure(double,double,double)
for a
definition of F measure). The result is the maximum F measure
value achieved at any position on the curve. For our example, this
arises at
maximumFMeasure() = 0.6
In general, the maximum F measure may occur at a point other than the precisionrecall breakeven point.
The average across multiple evaluations of average precision is somewhat misleadingly called mean average precision (MAP) [it should be average average precision, because averages are over finite samples and means are properties of distributions].
The elevenpoint precisionrecall curves, reciprocal rank, and R precision are also popular targets for reporting averaged results.
Modifier and Type  Field and Description 

static double 
FLOATING_POINT_EQUALS_EPSILON 
Constructor and Description 

ScoredPrecisionRecallEvaluation()
Construct a scored precisionrecall evaluation.

Modifier and Type  Method and Description 

void 
addCase(boolean correct,
double score)
Add a case with the specified correctness and response score.

void 
addMisses(int count)
Incrments the positive reference count without adding a
case from the classifier.

void 
addNegativeMisses(int count)
Incrments the negative reference count without adding a case
from the classifier.

double 
areaUnderPrCurve(boolean interpolate)
Returns the area under the curve (AUC) for the recallprecision
curve with interpolation as specified.

double 
areaUnderRocCurve(boolean interpolate)
Returns the area under the receiver operating characteristic
(ROC) curve.

double 
averagePrecision()
Returns the average of precisions at the true positive
results.

double[] 
elevenPtInterpPrecision()
Returns the interpolated precision at eleven recall points
evenly spaced between 0 and 1.

double 
maximumFMeasure()
Returns the maximum F_{1}measure for an
operating point on the PR curve.

double 
maximumFMeasure(double beta)
Returns the maximum F_{β}measure for
an operating point on the precisionrecall curve for a
specified precision weight
β > 0 . 
int 
numCases()
Returns the total number of positive and negative reference
cases for this evaluation.

int 
numNegativeRef()
Return the number of negative reference cases.

int 
numPositiveRef()
Returns the number of positive reference cases.

double 
prBreakevenPoint() 
double[][] 
prCurve(boolean interpolate)
Returns the precisionrecall curve, interpolating if
the specified flag is true.

double 
precisionAt(int rank)
Returns the precision score achieved by returning the top
scoring documents up to (but not including) the specified rank.

static void 
printPrecisionRecallCurve(double[][] prCurve,
PrintWriter pw)
Prints a precisionrecall curve with Fmeasures.

static void 
printScorePrecisionRecallCurve(double[][] prScoreCurve,
PrintWriter pw)
Prints a precisionrecall curve with score.

double[][] 
prScoreCurve(boolean interpolate)
Returns the array of recall/precision/score operating points
according to the scores of the cases.

double 
reciprocalRank()
Returns the reciprocal rank for this evaluation.

double[][] 
rocCurve(boolean interpolate)
Returns the receiver operating characteristic (ROC) curve for
the cases ordered by score, interpolating if the specified flag
is
true . 
double 
rPrecision()
Return the R precision.

String 
toString()
Returns a stringbased representation of this scored precision
recall evaluation.

public static final double FLOATING_POINT_EQUALS_EPSILON
public ScoredPrecisionRecallEvaluation()
public void addCase(boolean correct, double score)
true
if the reference was also
positive. The score is just the response score.
Warning: The scores should be sensibly comparable across cases.
correct
 true
if this case was correct.score
 Score of response.public void addMisses(int count)
count
 Number of outright misses to add to
this evaluation.IllegalArgumentException
 if the count is not positive.public void addNegativeMisses(int count)
count
 Number of outright misses to add to
this evaluation.IllegalArgumentException
 if the count is not positive.public int numCases()
numPositiveRef()
and #numNegativeRef()
.public int numPositiveRef()
true
plus
the number of misses added.public int numNegativeRef()
false
plus
the number of negative misses added.public double rPrecision()
public double[] elevenPtInterpPrecision()
public double averagePrecision()
#addMisses(int)
, the precision is considered
to be zero. (See class documentation for more information.)public double[][] prCurve(boolean interpolate)
Warning: Despite the name, the values returned are in the arrays with recall at index 0 and precision at index 1.
interpolate
 Set to true
for precision
interpolation.public double[][] prScoreCurve(boolean interpolate)
prCurve(boolean)
.
Index 0 is recall, 1 is precision and 2 is the score.
.
interpolate
 Set to true
if the precisions
are interpolated through pruning dominated points.public double[][] rocCurve(boolean interpolate)
true
. See the class documentation above for
a definition and example of the returned curve.interpolate
 Interpolate specificity values.public double maximumFMeasure()
public double maximumFMeasure(double beta)
β > 0
.public double precisionAt(int rank)
Double.NaN
is
returned.public double prBreakevenPoint()
public double reciprocalRank()
1/N
of the
rank N
at which the first true positive is found.
This method counts ranks from 1 rather than 0.
The return result will be between 1.0 for the firstbest result
being correct and 0.0, for none of the results being correct.public double areaUnderPrCurve(boolean interpolate)
Warning: This method uses the parallelogram method for interpolation rather than the usual interpolation method used to calculate AUC for precisionrecall in information retrieval evaluations. The usual AUC calculation for PR curves
interpolate
 Set to true
to interpolate
the precision values.public double areaUnderRocCurve(boolean interpolate)
interpolate
 Set to true
to interpolate
the rejection recall values.public String toString()
public static void printPrecisionRecallCurve(double[][] prCurve, PrintWriter pw)
prCurve(boolean)
: an array of length2 arrays of doubles.
In each length2 array, the recall value is at index 0, and the precision
is at index 1. The printed curve prints 3 columns in the following order:
precision, recall, Fmeasure.prCurve
 A precisionrecall curve.pw
 The output PrintWriter.public static void printScorePrecisionRecallCurve(double[][] prScoreCurve, PrintWriter pw)
prScoreCurve(boolean)
: an array of length3 arrays of doubles.
In each length3 array, the recall value is at index 0, and the precision
is at index 1 and score at 2. The printed curve prints 3 columns in the following order:
precision, recall, score.prScoreCurve
 A precisionrecall score curve.pw
 The output PrintWriter.