Related: https://en.wikipedia.org/wiki/Roofline_model


A measure to determine if your kernel or computation for a particular hardware is compute- or memory-bound, in order to scale & optimize the kernel.

Definition

Note

I started digging more into this topic as I read a blog post on transformers inference arithmetic a while ago. This spurred me to practice deriving and developing intuition for various common machine learning kernels (see examples below).

Examples

Arithmetic Intensity of a Neural Network Linear Layer