• Data comes in self-contained documents, and relationships between documents are rare.
  • Common file formats include JSON, XML, BSON

NoSQL implementation

  • Designed for Big Data
  • Have to model data to the query
    • As such, data tends to be denormalized and duplicated
    • No joins! One table per query
  • The primary key
    • has to be unique
    • is made up of one partition key (used to determine which node holds the data)
      • Partition key chosen should ideally be equally distributed to that work is distributed across all nodes
      • This reminds me a bit of BigTable schema design
    • and zero or more clustering keys (which controls the ascending sorting order)
    • the WHERE clause is a critical part of any NoSQL query
      • the partition keys and clustering columns have to be called in order