The CAP theorem

Before we get into the role of NOSQL, we must rst understand the CAP theorem. In the theory of computer science, the CAP theorem or Brewer’s theorem talks about distributed consistency. It states that it is impossible to achieve all of the following in a distributed system:

  • Consistency: Every client sees the most recently updated data state
  • Availability: The distributed system functions as expected, even if there are node failures
  • Partition tolerance: Intermediate network failure among nodes does not impact system functioning

    Although all three are impossible to achieve, any two can be achieved by the systems. That means in order to get high availability and partition tolerance, you need to sacri ce consistency. There are three types of systems:

  • CA: Data is consistent between all nodes, and you can read/write from any node, while you cannot afford to let your network go down. (For example: relational databases, columnar relational stores)
  • CP: Data is consistent and maintains tolerance for partitioning and preventing data going out of sync. (For example: Berkeley DB (key-value), MongoDB (document oriented), and HBase (columnar))
  • AP: Nodes are online always, but they may not get you the latest data; however, they sync whenever the lines are up. (For example: Dynamo (key-value), CouchDB (document oriented), and Cassandra (columnar))

    High availability can achieved through data replication; consistency is achieved by updating multiple nodes for changes in data. Relational databases are designed to achieve CA capabilities. NOSQL databases can either achieve CP or AP.

Source: Scaling Big Data with Hadoop and Solr, H. Karambelkar (2013)