Title: | Quantifying Performance of a Binary Classifier Through Weight of Evidence |
---|---|
Description: | The distributions of the weight of evidence (log Bayes factor) favouring case over noncase status in a test dataset (or test folds generated by cross-validation) can be used to quantify the performance of a diagnostic test (McKeigue (2019), <doi:10.1177/0962280218776989>). The package can be used with any test dataset on which you have observed case-control status and have computed prior and posterior probabilities of case status using a model learned on a training dataset. To quantify how the predictor will behave as a risk stratifier, the quantiles of the distributions of weight of evidence in cases and controls can be calculated and plotted. |
Authors: | Paul McKeigue [aut] , Marco Colombo [ctb, cre] |
Maintainer: | Marco Colombo <[email protected]> |
License: | GPL-3 |
Version: | 0.6.2.9000 |
Built: | 2024-11-09 03:06:55 UTC |
Source: | https://github.com/mcol/wevid |
The wevid package provides functions for quantifying the performance of a diagnostic test (or any other binary classifier) by calculating and plotting the distributions in cases and noncases of the weight of evidence favouring case over noncase status.
The distributions of the weight of evidence (log Bayes factor) favouring case over noncase status in a test dataset (or test folds generated by cross-validation) can be used to quantify the performance of a diagnostic test.
In comparison with the C-statistic (area under ROC curve), the expected weight of evidence (expected information for discrimination) has several advantages as a summary measure of predictive performance. To quantify how the predictor will behave as a risk stratifier, the quantiles of the distributions of weight of evidence in cases and controls can be calculated and plotted.
This package can be used with any test dataset on which you have observed case-control status and have computed prior and posterior probabilities of case status using a model learned on a training dataset. Therefore, you should have computed on a test dataset (or on test folds used for cross-validation):
The prior probability of case status (this may be just the frequency of cases in the training data).
The posterior probability of case status (using the model learned on the training data to predict on the test data).
The observed case status (coded as 0 for noncases, 1 for cases).
The main function of the package is Wdensities
which computes
the crude and model-based densities of weight of evidence in cases and
controls. Once these are computed, they can be plotted with
plotWdists
and plotcumfreqs
. Summary statistics
can be reported with summary
.
Paul McKeigue [email protected]
Paul McKeigue (2019), Quantifying performance of a diagnostic test as the expected information for discrimination: Relation to the C-statistic. Statistical Methods for Medical Research, 28 (6), 1841-1851. https://doi.org/10.1177/0962280218776989.
Useful links:
Plot the cumulative frequency distributions in cases and in controls
plotcumfreqs(densities)
plotcumfreqs(densities)
densities |
Densities object produced by |
A ggplot object representing the cumulative frequency distributions of the smoothed densities of the weights of evidence in cases and in controls.
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) plotcumfreqs(densities)
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) plotcumfreqs(densities)
While the crude ROC curve can be non-concave and is generally not smooth, the model-based ROC curve is always concave, as the corresponding densities have been adjusted to be mathematically consistent.
plotroc(densities)
plotroc(densities)
densities |
Densities object produced by |
A ggplot object representing crude and model-based ROC curves.
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) plotroc(densities)
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) plotroc(densities)
Plot the distribution of the weight of evidence in cases and in controls
plotWdists(densities, distlabels = c("Crude", "Model-based"))
plotWdists(densities, distlabels = c("Crude", "Model-based"))
densities |
Densities object produced by |
distlabels |
Character vector of length 2 to be used to label the crude and the model-based curves (in that order). |
A ggplot object representing the distributions of crude and model-based weights of evidence in cases and in controls.
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) plotWdists(densities) # Example which requires fitting a mixture distribution data(fitonly) densities <- with(fitonly, Wdensities(y, posterior.p, prior.p)) # truncate spike plotWdists(densities) + ggplot2::scale_y_continuous(limits=c(0, 0.5))
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) plotWdists(densities) # Example which requires fitting a mixture distribution data(fitonly) densities <- with(fitonly, Wdensities(y, posterior.p, prior.p)) # truncate spike plotWdists(densities) + ggplot2::scale_y_continuous(limits=c(0, 0.5))
Proportions of cases and controls below a threshold of weight of evidence
prop.belowthreshold(densities, w.threshold)
prop.belowthreshold(densities, w.threshold)
densities |
Densities object produced by |
w.threshold |
Threshold value of weight of evidence (natural logs). |
Numeric vector of length 2 listing the proportions of controls and cases with weight of evidence below the given threshold.
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) w.threshold <- log(4) # threshold Bayes factor of 4 prop.belowthreshold(densities, w.threshold)
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) w.threshold <- log(4) # threshold Bayes factor of 4 prop.belowthreshold(densities, w.threshold)
Transforms posterior probabilities to logits, fits a logistic regression model and returns the predictive probabilities from this model.
recalibrate.p(y, posterior.p)
recalibrate.p(y, posterior.p)
y |
Binary outcome label (0 for controls, 1 for cases). |
posterior.p |
Vector of posterior probabilities. |
Recalibrated posterior probabilities.
Summary evaluation of predictive performance
## S3 method for class 'Wdensities' summary(object, ...) ## S3 method for class 'Wdensities' mean(x, ...) auroc.crude(densities) auroc.model(densities) lambda.crude(densities) lambda.model(densities)
## S3 method for class 'Wdensities' summary(object, ...) ## S3 method for class 'Wdensities' mean(x, ...) auroc.crude(densities) auroc.model(densities) lambda.crude(densities) lambda.model(densities)
object , x , densities
|
Densities object produced by
|
... |
Further arguments passed to or from other methods. These are currently ignored. |
summary
returns a data frame that reports the number of cases and
controls, the test log-likelihood, the crude and model-based C-statistic
and expected weight of evidence Lambda.
mean
returns a numeric vector listing the mean densities of the weight
of evidence in controls and in cases.
auroc.crude
and auroc.model
return the area under the ROC curve
according to the crude and the model-based densities of weight of evidence,
respectively.
lambda.crude
and lambda.model
return the expected weight of
evidence (expected information for discrimination) in bits from the crude
and the model-based densities, respectively.
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) summary(densities) mean(densities) auroc.model(densities) lambda.model(densities)
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) summary(densities) mean(densities) auroc.model(densities) lambda.model(densities)
The function computes smoothed densities of the weight of evidence in cases and in controls from the crude probabilities, then adjusts them to make them mathematically consistent so that p(W_ctrl) = exp(-W) p(W_case).
Wdensities(y, posterior.p, prior.p, range.xseq = c(-25, 25), x.stepsize = 0.01, adjust.bw = 1, recalibrate = TRUE, debug = FALSE)
Wdensities(y, posterior.p, prior.p, range.xseq = c(-25, 25), x.stepsize = 0.01, adjust.bw = 1, recalibrate = TRUE, debug = FALSE)
y |
Binary outcome label (0 for controls, 1 for cases). |
posterior.p |
Vector of posterior probabilities generated by using model to predict on test data. |
prior.p |
Vector of prior probabilities. |
range.xseq |
Range of points where the curves should be sampled. |
x.stepsize |
Distance between each point. |
adjust.bw |
Bandwidth adjustment for the Gaussian kernel density estimator. By default it is set to 1 (no adjustment), setting it to a value smaller/larger than 1 reduces/increases the smoothing of the kernel. This argument is ignored if more than one mixture component is identified. |
recalibrate |
If |
debug |
If |
If the sample distributions in cases and controls support a 2-component mixture model (based on model comparison with BIC) for the densities, this will be detected and a 2-component mixture model will be fitted before adjustment.
A densities object that contains the information necessary to compute summary measures and generate plots.
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) # Example which requires fitting a mixture distribution data(fitonly) densities <- with(fitonly, Wdensities(y, posterior.p, prior.p))
data(cleveland) densities <- with(cleveland, Wdensities(y, posterior.p, prior.p)) # Example which requires fitting a mixture distribution data(fitonly) densities <- with(fitonly, Wdensities(y, posterior.p, prior.p))
Calculate weights of evidence in natural log units
weightsofevidence(posterior.p, prior.p)
weightsofevidence(posterior.p, prior.p)
posterior.p |
Vector of posterior probabilities generated by using model to predict on test data. |
prior.p |
Vector of prior probabilities. |
The weight of evidence in nats for each observation.
data(cleveland) # load example dataset W <- with(cleveland, weightsofevidence(posterior.p, prior.p))
data(cleveland) # load example dataset W <- with(cleveland, weightsofevidence(posterior.p, prior.p))
Example datasets
The wevid package comes with the following dataset:
cleveland
is based on cross-validated prediction of coronary
disease in the Cleveland Heart Study (297 observations).
pima
is based on cross-validated prediction of diabetes
in Pima Native Americans (768 observations).
fitonly
is based on cross-validated prediction of colorectal
cancer from fecal immunochemical test (FIT) only in Michigan (242
observations). As most controls and some cases have have zero values
in the FIT test, to fit densities to the sampled values of weight of
evidence in controls and cases it is necessary to specify spike-slab
mixtures.
Each dataset consists of a data frame with the following variables:
prior.p: Prior probabilities of case status.
posterior.p: Posterior probabilities of case status.
y: Case-control status.
http://www.homepages.ed.ac.uk/pmckeigu/preprints/classify/wevidtutorial.html