Data Models

Relational vs Non-relational databases

Deciding between the two falls mostly around:

Aspects	Relational	Non-relational
Schema definition	Well-defined	Not well-defined
Schema assumption	At write-time	At read-time
Data source flexibility	Low; incoming data must abide by schema	High; Any data can come in without meeting schema requirements
Data consumption flexibility	Low	High; Multiple consumers can each assume their own schema
Schema change implications	Upstream and downstream processes need to be updated	Only downstream processes that are affected need to update their schema
Storage engines	Data warehouses	Data lakes
Scalable	Yes Once downstream access patterns are well-defined	No Shifting through lots of raw data is inefficient

So far, this applies to tabular data. Haven’t been tested against other forms of data modalities.

Extracted data stays in data lakes.

This way, when incoming schema changes, raw data (from a data lake) can be re-transformed and loaded into the data warehouse.

This also serves as a business logic to machine learning translation layer.