This thought experiment originated from Chip Huyen’s Designing Machine Learning Systems.
TL;DR
By knowing the label prior, you can already tell a lot about whether a classification model is performing better or worse than a random model that predicts with the label prior.
F1 as a simple measuring stick
| Scenario | Interpretation |
|---|---|
| Same performance as random model using label prior | |
| Worse performance than random model using label prior | |
| Better performance than random model using label prior |
Context
Given a classification problem, where 10% of the labels are positive. What is the accuracy and F1 performance of a random model on this dataset?
Scenario 1: Random model using beta prior
This yields the following confusion matrix:
| Label | |||
|---|---|---|---|
| 0 | 1 | ||
| Prediction | 0 | 45 | 5 |
| 1 | 45 | 5 | |
| The corresponding metrics would be |
| Metric | Value |
|---|---|
| accuracy | |
| precision | |
| recall | |
| F1 |
Scenario 2: Random model using label prior
This yields the following confusion matrix:
| Label | |||
|---|---|---|---|
| 0 | 1 | ||
| Prediction | 0 | 81 | 9 |
| 1 | 9 | 1 | |
| The corresponding metrics would be |
| Metric | Value |
|---|---|
| accuracy | |
| precision | |
| recall | |
| F1 |
Observations
Given a model that randomly predicts, the
- precision matches the label prior
- recall matches the prediction prior
These two corollaries, when logically extended, leads to the intuition in TL;DR.