ML Performance Baselines

There are a few types of baselines:

random baseline
simple heuristic
zero rule baseline¹
human baseline
existing solutions

Notes on Random Baseline

Random baselines can assume two kinds of prior:

beta prior (same probability for all classes)
label prior (match the label probability)

The F1 metric is a super simple measuring stick to estimate a model’s performance against the random baseline:

TL;DR

By knowing the label prior, you can already tell a lot about whether a classification model is performing better or worse than a random model that predicts with the label prior.

F1 as a simple measuring stick

Scenario Interpretation
$F 1 = P (Y)$ Same performance as random model using label prior
$F 1 < P (Y)$ Worse performance than random model using label prior
$F 1 > P (Y)$ Better performance than random model using label prior
Link to original

Scenario	Interpretation
$F 1 = P (Y)$	Same performance as random model using label prior
$F 1 < P (Y)$	Worse performance than random model using label prior
$F 1 > P (Y)$	Better performance than random model using label prior

special case of simple heuristic—pick the most common class ↩

🪴 Chris' Digital Garden

Recent Notes

Arithmetic Intensity of a Neural Network Linear Layer

Automatic Material System

Explorer

ML Performance Baselines

Notes on Random Baseline

TL;DR

Graph View

Backlinks

🪴 Chris' Digital Garden

Recent Notes

Arithmetic Intensity of a Neural Network Linear Layer

Automatic Material System

Explorer

ML Performance Baselines

Notes on Random Baseline

TL;DR

Footnotes

Graph View

Backlinks