This thought experiment originated from Chip Huyen’s Designing Machine Learning Systems.
TL;DR
By knowing the label prior, you can already tell a lot about whether a classification model is performing better or worse than a random model that predicts with the label prior.
F1 as a simple measuring stick
Scenario | Interpretation |
---|---|
Same performance as random model using label prior | |
Worse performance than random model using label prior | |
Better performance than random model using label prior |
Context
Given a classification problem, where 10% of the labels are positive. What is the accuracy and F1 performance of a random model on this dataset?
Scenario 1: Random model using beta prior
This yields the following confusion matrix:
Label | |||
---|---|---|---|
0 | 1 | ||
Prediction | 0 | 45 | 5 |
1 | 45 | 5 | |
The corresponding metrics would be |
Metric | Value |
---|---|
accuracy | |
precision | |
recall | |
F1 |
Scenario 2: Random model using label prior
This yields the following confusion matrix:
Label | |||
---|---|---|---|
0 | 1 | ||
Prediction | 0 | 81 | 9 |
1 | 9 | 1 | |
The corresponding metrics would be |
Metric | Value |
---|---|
accuracy | |
precision | |
recall | |
F1 |
Observations
Given a model that randomly predicts, the
- precision matches the label prior
- recall matches the prediction prior
These two corollaries, when logically extended, leads to the intuition in TL;DR.