Problem
Design unique ID generator in distributed systems.
Clarifying Questions
- What are the characteristics of unique IDs?
- For each new record, does the ID increment by 1?
- Do IDs only contain numerical values?
- What is the ID length requirement?
- What is the scale of the system?
Answers:
- IDs must be unique and sortable
- IDs are numerical values only
- IDs fit into 64-bit
- IDs are ordered by date
- Ability to generate over 10,000 unique IDs per second
High-Level Design
We have different options:
Multi-master replication
This approach use database auto-increment feature.
Instead of increasing the next ID by 1, we increase it by k, where k is the number of database servers in use. From the image, we see that the next ID to be generated is equal to the previous ID in the same server plus 2. This solves some scalability issues because IDs can scale with the number of database servers.
Drawbacks:
- Hard to scale with multiple data centers
- IDs do not go up with the time across multiple servers
- It does not scale well when a server is added or removed

UUID
A UUID is another easy way to obtain unique IDs.
- UUID is a 128-bit number used to identify information in computer systems.
- UUID has a very low probability of getting collusion
- UUIDs can be generated independently without coordination between servers

Each web server contains an ID generator, and a web server is responsible for generating IDs independently.
Pros:
- Generating UUID is simple
- No coordination between servers is needed, therefore there will not be any synchronization issues.
- Easy to escale
- Each web server is responsible for generating IDs they consume.
- ID generator can easily scale with web servers
Cons:
- IDs are 128 bits long, but our requirement is 64 bits
- IDs do not go up with time
- IDs could be non-numeric
Ticket Server
Use a centralized auto_increment feature in a single database server (Ticket Server).

Pros:
- Numeric IDs
- It is easy to implement, and it works well for small to medium scale
Cons:
- Single point of failure. Single ticket server means if the ticket server goes down, all the system that depend on it will face issues.
- To avoid this, we can set up multiple ticket servers. -> This may cause new challenges such as data synchronization.
Twitter Snowflake Approach
Instead of generating an ID directly, we divide an ID into different sections.

| Type | Size | Definition |
| Sign Bit | 1 bit | It will be always 0. This is reserved for future uses. |
| Timestamp | 41 bits | Milliseconds since the epoch or custom epoch |
| Datacenter ID | 5 bits | 2 ^ 5 = 32 data Centers |
| Machine ID | 5 bits | 2 ^ 5 = 32 machines per data center |
| Sequence Number | 12 bits | For every ID generated on that machine/process, the sequence number is incremented by 1. The number is reset to 0 every millisecond. |
Design Deep Dive
- Datacenter IDs and Machine IDs are chosen at the startup time, generally fixed once the system is up and running.
- Any changes in datacenter IDs and machine IDs required careful review since accidental change in those values can lead to ID conflicts.
- TimeStamp and sequence numbers are generated when the ID generator is running
- The max timestamp that can be represented in 41 bits is 2 ^ 41 – 1 = 69 years. This means the ID generator will work for 69 years. After 69 years, we will need a new epoch time pr adopt other techniques to migrate IDs
- Sequence number is 12 bits, which give us 2 ^ 12 = 4096 combinations. In theory, a machine can support a maximum of 4096 new IDs per millisecond.
Wrap up
- Clock synchronization: Not all the servers have the same clock. Network protocol is the most popular solution to this problem.
- Section length tunning: Fewer sequence numbers but more timestamp bits are effective for low concurrency and long-term applications.
- High availability: Since an ID generator is a mission-critical system, it must be highly available