Distributed consensus: Raft from first principles —
On this page
Skip to contentOverview
Raft is the most approachable consensus protocol, and that’s precisely why it’s hard to learn from the paper alone. The paper is well-written but assumes you already understand the mental model of distributed state machines. This log documents the process of building that mental model from scratch — drawing state machines, implementing a toy protocol, and breaking it on purpose to understand failure modes.
State Machines First
The core insight of Raft is that consensus is about agreeing on a sequence of state machine inputs, not about agreeing on values directly. Each node maintains an identical state machine. Consensus ensures that all nodes apply the same inputs in the same order, which guarantees they reach the same state.
I drew this out by hand first. Three nodes, each with their own state machine diagram. Arrows between them labeled with “commit index” and “last applied.” The visual model made it clear why the commit index is the critical variable — it’s the boundary between “decided” and “undecided” log entries.
Implementing the Leader
The first implementation focused on leader election. The key mechanics:
- Each node starts as a follower with a randomized election timeout (150-300ms)
- If a follower doesn’t hear from a leader within the timeout, it becomes a candidate
- The candidate increments its term, votes for itself, and requests votes from all other nodes
- A vote is granted if the candidate’s log is at least as up-to-date as the voter’s log
The tricky part is the log comparison. “Up-to-date” means: if two logs end with entries of different terms, the one with the higher term is more up-to-date. If terms are equal, the longer log is more up-to-date. This prevents a node with an old leader from winning an election while holding stale log entries.
Breaking It
The most valuable part of this study was intentionally breaking the implementation:
- Split brain: What happens when two candidates win votes simultaneously? (Answer: both become leaders for different terms, but only one survives because the other’s AppendEntries RPCs will be rejected with a higher term.)
- Log inconsistency: What if a leader crashes after replicating entries to only a majority? (Answer: the new leader may not have those entries, and Raft’s log matching property ensures the follower’s log is truncated to match.)
- Network partitions: What if the leader is isolated from the majority? (Answer: the minority partition cannot make progress, which is the correct behavior — safety over availability.)
Progress
Currently implementing the log replication phase. Leader election works. Next up: handling client requests, log replication, and commit safety.