🏠

Partitioning Concepts


partition == shard (MongoDB/ElasticSearch) == region (HBase) == tablet (BigTable) == vnode (Cassandra/Riak)

Typically:

  • many partitions per node.
  • each partition replicated on many nodes (fault tolerance).

Why partition?

For scalability:

  • More data capacity that one machine can hold.
  • Query throughput scales with the number of machines.

Explain consistent hashing

Consistent Hashing article

(DDIA recommends calling it hash partitioning, because consistent is confusing)

Hotspots

If data is partitioned by userId and some users are celebrities, the partition that owns the hash of the celebrity will be a hotspot (skewed workload).

Fix: keep track of the few “hot userIds”, and whenever a write happens, prefix or postfix a random number (e.g. rand(10) or rand(100)), then the posts end up on 10/100 different partitions. But reads for celebrity posts need to go to many partitions instead of 1 (in parallel).

Rebalancing: automatic or manual

  • Automatic is convenient but dangerous: if instances rebalancing get too busy and start failing heartbeats, they’ll become offline and cause further rebalancing 😱😱😱
  • CouchBase, Riak & Voldemort suggest partition assignment, but require a human to approve.

Secondary indexes

For partitioned data, two flavours:

  • Local to partition, but then searches scatter-gather to all partitions!
  • Global, but then it doesn’t fit on a machine so it must itself be partitioned.

Note that secondary indexes typically update asynchronously.

Request routing (or “Service discovery”)

Options:

  • Client knows which node has the key.
  • Client sends to routing layer, who knows which node has the key.
  • Client sends to any node, and nodes know who to route to.

Nodes get added & removed all the time, so how to keep knowledge of who owns which key?

  • Every node heartbeats every node? 😱 scales quadratically! NO!
  • Gossip protocol (e.g. Cassandra).
  • External service implementing Raft/Paxos consensus (e.g. ZooKeeper).

 

Issues & PRs welcome ♥️
Powered by Hugo - Theme beautifulhugo