Related: file format


Question

How do I want to store my data?

It can add complexity for querying if not designed properly:

  • access patterntodo
  • data modality
  • serializing models

Data Modality

Text
Images
Audio etc.
Tabular

Model serialization formats

FormatProsCons
pickleIdeal for loading Python objectsInsecure to load untrusted model files
joblib 1Ideal for loading numpy arraysInsecure to load untrusted model files
Tensorflow model formats
ptSpecific to PyTorch
[[ONNX|onnx]]- A framework-agnostic format2
- Allow execution in different environments, languages and hardware
- Limited to operations supported by ONNX

Footnotes

  1. See more in [[pickle vs joblib]]

  2. Can also convert sklearn pipelines