Random Baseline Thought Experiment

This thought experiment originated from Chip Huyen’s Designing Machine Learning Systems.

TL;DR

By knowing the label prior, you can already tell a lot about whether a classification model is performing better or worse than a random model that predicts with the label prior.

F1 as a simple measuring stick

Scenario	Interpretation
$F 1 = P (Y)$	Same performance as random model using label prior
$F 1 < P (Y)$	Worse performance than random model using label prior
$F 1 > P (Y)$	Better performance than random model using label prior

Context

Given a classification problem, where 10% of the labels are positive. What is the accuracy and F1 performance of a random model on this dataset?

Scenario 1: Random model using beta prior

This yields the following confusion matrix:

		Label
		0	1
Prediction	0	45	5
	1	45	5
The corresponding metrics would be

Metric	Value
accuracy	$(45 + 5) /100 = 0.5$
precision	$5/ (45 + 5) = 0.1$
recall	$5/ (5 + 5) = 0.5$
F1	$2/ (10 + 2) = 1/6$

Scenario 2: Random model using label prior

This yields the following confusion matrix:

		Label
		0	1
Prediction	0	81	9
	1	9	1
The corresponding metrics would be

Metric	Value
accuracy	$(81 + 1) /100 = 0.82$
precision	$1/ (9 + 1) = 0.1$
recall	$1/ (9 + 1) = 0.1$
F1	$2/ (10 + 10) = 0.1$

Observations

Given a model that randomly predicts, the

precision matches the label prior $P (Y)$
recall matches the prediction prior $P (\overset{p}{^})$

These two corollaries, when logically extended, leads to the intuition in TL;DR.

🪴 Chris' Digital Garden

Recent Notes

Arithmetic Intensity of a Neural Network Linear Layer

Automatic Material System

Explorer

Random Baseline Thought Experiment

TL;DR

Context

Scenario 1: Random model using beta prior

Scenario 2: Random model using label prior

Observations

Graph View

Table of Contents

Backlinks