ML System Requirements

Four general requirements of a generic ML system:

Reliability

Types	Descriptions
Engineering reliability	Logical correctness in ML and business logic (i.e. proper functions are being called). Also related to metrics like uptime and DORA stability metrics
ML reliability	Predictions are correct, and there are no silent failures or incorrect predictions

ML applications can scale in a few aspects:

As this happens, attention has to be given to ML Resource Management and ML Artifact Management.

Aspects	Description
Reproducibility	Requires: - code - data - artifacts - context (how different pieces work/are strung together)
Effective collaboration	Between teams, or team members

A system should easily accept improvements without service interruptions.

Improvement Type	Description
Data	More / recent / higher-quality data points
Model	New architecture; more features

Deploying these changes should be easy and fast, following DORA speed metrics.

more models for more use-cases/customers; common in multi-tenant and SaaS applications ↩