Package 'MLeval' reference manual

Title:	Machine Learning Model Evaluation
Description:	Straightforward and detailed evaluation of machine learning models. 'MLeval' can produce receiver operating characteristic (ROC) curves, precision-recall (PR) curves, calibration curves, and PR gain curves. 'MLeval' accepts a data frame of class probabilities and ground truth labels, or, it can automatically interpret the Caret train function results from repeated cross validation, then select the best model and analyse the results. 'MLeval' produces a range of evaluation metrics with confidence intervals.
Authors:	Christopher R John
Maintainer:	Christopher R John <[email protected]>
License:	AGPL-3
Version:	0.3
Built:	2025-03-10 03:03:20 UTC
Source:	https://github.com/crj32/mleval

brier_score: A Brier score function

Description

Calculates the Brier score to evaluate probabilities. A data frame of probabilities and ground truth labels must be passed in to evaluate. Raw probability data must be column1: prob G1, column2: prob G2, column3: obs labels, column4: Group (optional). Zero is optimal and more positive is less.

Usage

brier_score(preds, positive = colnames(preds)[2])
brier_score(preds, positive = colnames(preds)[2])

Arguments

`preds`	Data frame: Data frame of probabilities and ground truth labels.
`positive`	Character vector: The name of the positive group, must equal a column name consisting of probabilities.

Value

Brier score

Examples

r2 <- brier_score(preds)
r2 <- brier_score(preds)

evalm: Evaluate Machine Learning Models in R

Description

evalm is for machine learning model evaluation in R. The function can accept the Caret 'train' function results to evaluate machine learning predictions or a data frame of probabilities and ground truth labels can be passed in to evaluate. Probability data must be column1: probability group1 (column named as your group name 1), column2: probability group2 (column named as your group name 2), column3: observation labels (column named 'obs'), column4: Group, e.g. different models (column named 'Group'), optional to include if different models are combined horizontally.

Usage

evalm(list1, gnames = NULL, title = "", cols = NULL,
  silent = FALSE, rlinethick = 1.25, fsize = 12.5,
  dlinecol = "grey", dlinethick = 0.75, bins = 6, optimise = "INF",
  percent = 95, showplots = TRUE, positive = NULL, plots = c("prg",
  "pr", "r", "cc"))
evalm(list1, gnames = NULL, title = "", cols = NULL,
  silent = FALSE, rlinethick = 1.25, fsize = 12.5,
  dlinecol = "grey", dlinethick = 0.75, bins = 6, optimise = "INF",
  percent = 95, showplots = TRUE, positive = NULL, plots = c("prg",
  "pr", "r", "cc"))

Arguments

`list1`	List or data frame: List of Caret results objects from train, or a single train results object, or a data frame of probabilities and observed labels
`gnames`	Character vector: A vector of group names for the fit objects
`title`	Character string: A title for the ROC plot
`cols`	Character vector: A vector of colours for the group or groups
`silent`	Logical flag: whether to hide messages (default=FALSE)
`rlinethick`	Numerical value: Thickness of the ROC curve line
`fsize`	Numerical value: Font size for the ROC curve plots
`dlinecol`	Character string: Colour of the diagonal line
`dlinethick`	Numerical value: Thickness of the diagonal line
`bins`	Numerical value: Number of bins for calibration curve
`optimise`	Character string: Metric by which to select the operating point (INF, MCC, or F1)
`percent`	Numerical value: percentage for the confidence intervals (default = 95)
`showplots`	Logical flag: whether to show plots or not
`positive`	Character string: Name of the positive group (will effect PR metrics)
`plots`	Character vector: which plots to show: r = roc, pr = proc, prg = precision recall gain, cc = calibration curve

Value

List containing: 1) A ggplot2 ROC curve object for printing 2) A ggplot2 PROC object for printing 3) A ggplot2 PRG curve for printing 4) Optimised results according to defined metric 5) P cut-off of 0.5 standard results

Examples

r <- evalm(fit)
r <- evalm(fit)

Random forest fitted object from Caret on Sonar data

Description

Caret was run using 10 fold cross validation on the Sonar data with random forest used to predict the response variable.

Usage

fit
fit

Format

A Caret train object

Random forest fitted object from Caret on Sonar data

Description

Caret was run using 10 fold repeated cross validation on the Sonar data with random forest used to predict the response variable.

Usage

fit1
fit1

Format

A Caret train object

Gradient boosted machines fitted object from Caret on Sonar data

Description

Caret was run using 10 fold repeated cross validation on Sonar data with GBM used to predict the response variable.

Usage

fit2
fit2

Format

A Caret train object

Random forest fitted object from Caret on Sonar data with log-likelihood objective function

Description

Caret was run using 10 fold repeated cross validation on the Sonar data using random forest to predict the response variable. Log-likelihood was set to be the objective function to select the best model from cross validation.

Usage

fit3
fit3

Format

A Caret train object

Random forest fitted object from Caret on simulated imbalanced data

Description

Caret was run using 10 fold repeated cross validation on the Sonar data with random forest to predict the response variable.

Usage

im_fit
im_fit

Format

A Caret train object

LL: Log-likelihood function

Description

Calculates the Log-likelihood to evaluate probabilities. A data frame of probabilities and ground truth labels must be passed in to evaluate. Raw probability data must be column1: prob G1, column2: prob G2, column3: obs labels, column4: Group (optional). Zero is optimal and more negative is less.

Usage

LL(preds, positive = colnames(preds)[2])
LL(preds, positive = colnames(preds)[2])

Arguments

`preds`	Data frame: Data frame of probabilities and ground truth labels.
`positive`	Character vector: The name of the positive group, must equal a column name consisting of probabilities.

Value

Log-likelihood

Examples

r1 <- LL(preds)
r1 <- LL(preds)

Predictions from gbm on the Sonar test data

Description

Usage

preds
preds

Format

A data frame with 51 rows as points and 4 variables

Predictions from gbm and random forest on the Sonar test data

Description

The Sonar data was split into training (157 points) and testing (51 points), a gbm model was fitted using Caret on the training data. Then these are the predicted probabilities of the model on the test data. A random forest model was then fit and tested in the same manner. The probabilities and ground truth labels were combined horizontally for further analysis.

Usage

predsc
predsc

Format

A data frame with 102 rows as points and 4 variables

Sonar data

Description

The Sonar data consist of 208 data points collected on 60 predictors. The goal is to predict the two classes M for metal cylinder or R for rock. This data has been obtained from the 'mlbench' package. Response variable is in the Class column.

Usage

Sonar
Sonar

Format

A data frame with 208 rows as points and 61 variables

Package 'MLeval'

Help Index

brier_score: A Brier score function

Description

Usage

Arguments

Value

Examples

evalm: Evaluate Machine Learning Models in R

Description

Usage

Arguments

Value

Examples

Random forest fitted object from Caret on Sonar data

Description

Usage

Format

Random forest fitted object from Caret on Sonar data

Description

Usage

Format

Gradient boosted machines fitted object from Caret on Sonar data

Description

Usage

Format

Random forest fitted object from Caret on Sonar data with log-likelihood objective function

Description

Usage

Format

Random forest fitted object from Caret on simulated imbalanced data

Description

Usage

Format

LL: Log-likelihood function

Description

Usage

Arguments

Value

Examples

Predictions from gbm on the Sonar test data

Description

Usage

Format

Predictions from gbm and random forest on the Sonar test data

Description

Usage

Format

Sonar data

Description

Usage

Format