Intuition
Ideally, unique ID generation is delegated to a RDBMS, e.g. MySQL’s PKs. At scale, this is a bottleneck. How do big companies do it?
Requirements
- Correct: always unique!
- Performance: many ids per second (> to RDBMS can do, e.g. 100k/s), grows over time, few ms per id.
- Reliability: if DB dies or network disconnects, needs to keep working.
- K-Sortability: ids need to be directly sortable (i.e. without the content) and always be roughly
>
than previous ids to a ~1s precision. - Complexity: system not too complex, least moving parts.
- Size: ids not too large, most systems use 64 bits (UUID => 128 bits).
Flickr (2010)
Simply have 2 MySQL instances producing even & odd numbers behind a load balancer, and if an instance is unresponsive have a backup ready.
Note that sortability may suffer if one instance is faster than the other (or the network).
Twitter (2010)
Twitter blogpost Twitter Snowflake Github README
- Many servers generate 64-bit ids: 41 bit millisecond-precision timestamp, 10-bit machine id, 12-bit autoincrement (1 reserved??).
- Autoincrement % 2^12 with protection for same millisecond rollover.
- NTP to keep server clocks accurate, with protection for non-monotonic (i.e. going back in time to correct).
- ZooKeeper to choose worker numbers (but not used for ID coordination between nodes).
Instagram (2012)
- Many servers generate 64-bit ids: 41 bit millisecond-precision timestamp, 13-bit machine id, 10-bit autoincrement.
- Rather than special purpose, use Postgres to generate.
- Use Postgres schemas to separate logical shards (each shard is a machine-id) but keep them in fewer physical shards, and move schemas across to scale.