Nvidia GPU Ridge Points

Related: Ridge point

I was curious on what these numbers are for leading Nvidia GPUs, given my recent exploration on Arithmetic Intensity, so here they are:

Datatype	A100 (PCIe)¹	A100 (SXM)¹	H100 (PCIe)	H100 (SXM)
FP64	5	4	17	10
FP64 (Tensor Core)	10	9	33	20
FP32	10	9	33	20
TF32 (Tensor Core)	80	76	494	295
BF16 (Tensor Core)	161	153	989	590
FP16 (Tensor Core)	161	153	989	590
FP8 (Tensor Core)	322	306	1979	1181
INT8 (Tensor Core)	322	306	1979	1181

Ridge points for Nvidia A100 and H100 GPUs, across different configurations

These were derived from the A100 and H100 spec sheets.

Perhaps it would be more effective to map this to commercially available VMs of cloud providers, but a quick look at AWS’ spec sheet shows different parameters, especially for their H100’s, clocking in at 1000 TFLOPs FP16 instead of 1513 TFLOPs FP16 (33% below max performance). I might create a table for this in the future. todo

Reproducing Code

import pandas as pd
 
flops_df = pd.DataFrame(
    [
        dict(gpu='A100', fp64=9.7, fp64tc=19.5, fp32=19.5, tf32=156, bf16=312, fp16=312, i8=624),
        dict(gpu='H100', fp64=34, fp64tc=67, fp32=67, tf32=989, bf16=1979, fp16=1979, i8=3958),
    ]
).set_index('gpu').T
 
memory_bandwidth_df = pd.DataFrame(
    [
        dict(gpu='A100', pcie=1.935, sxm=2.039),  # Assuming 80GB HBM2e specs
        dict(gpu='H100', pcie=2, sxm=3.35),
    ]
).set_index('gpu').T
 
ridge_point_pcie_df = flops_df / memory_bandwidth_df.loc['pcie']
ridge_point_sxm_df = flops_df / memory_bandwidth_df.loc['sxm']
 
# Convert columns into a multi-index column levelled by <GPU>-<interconnect>
ridge_point_pcie_df.columns = pd.MultiIndex.from_product([['pcie'], flops_df.columns])
ridge_point_sxm_df.columns = pd.MultiIndex.from_product([['sxm'], flops_df.columns])
 
ridge_point_df = pd.concat([ridge_point_pcie_df, ridge_point_sxm_df], axis=1)
ridge_point_df.columns = ridge_point_df.columns.swaplevel()
ridge_point_df = ridge_point_df.sort_index(axis=1)
 
ridge_point_df.astype(int)

Assuming 80GB HBM2e memory bandwidths. ↩ ↩²

🪴 Chris' Digital Garden

Recent Notes

Arithmetic Intensity of a Neural Network Linear Layer

Automatic Material System

Explorer

Nvidia GPU Ridge Points

Reproducing Code

Graph View

Backlinks

🪴 Chris' Digital Garden

Recent Notes

Arithmetic Intensity of a Neural Network Linear Layer

Automatic Material System

Explorer

Nvidia GPU Ridge Points

Reproducing Code

Footnotes

Graph View

Backlinks