This thought experiment originated from Chip Huyen’s Designing Machine Learning Systems.

TL;DR

By knowing the label prior, you can already tell a lot about whether a classification model is performing better or worse than a random model that predicts with the label prior.

F1 as a simple measuring stick

ScenarioInterpretation
Same performance as random model using label prior
Worse performance than random model using label prior
Better performance than random model using label prior

Context

Given a classification problem, where 10% of the labels are positive. What is the accuracy and F1 performance of a random model on this dataset?

Scenario 1: Random model using beta prior

This yields the following confusion matrix:

Label
01
Prediction0455
1455
The corresponding metrics would be
MetricValue
accuracy
precision
recall
F1

Scenario 2: Random model using label prior

This yields the following confusion matrix:

Label
01
Prediction0819
191
The corresponding metrics would be
MetricValue
accuracy
precision
recall
F1

Observations

Given a model that randomly predicts, the

  • precision matches the label prior
  • recall matches the prediction prior

These two corollaries, when logically extended, leads to the intuition in TL;DR.