Distributed Systems

ECE419, Winter 2026
University of Toronto
Instructor: Ashvin Goel

    Distributed Systems

Lab 3: Raft Protocol

Due dates: Mar 1, Mar 15, Mar 22

In this lab you'll implement Raft, a replicated state machine protocol. In Lab 4, you will use your Raft implementation to build a linearizable, replicated key/value server.

This lab has three parts (3A, 3B, 3C) that build on the previous parts. The submission deadlines for the three parts are shown above.

Overview

A replicated service (e.g., a key/value service) achieves fault tolerance by storing copies of its state (e.g., all key/value data) on multiple servers called replica servers or replicas. Replication allows the service to continue operating even if some of its replicas experience failures (crashes or a failed or flaky network) thus providing high availability. The challenge is that failures may cause the replicas to hold differing copies of the data, making it hard to ensure correct (e.g., linearizable) operation.

Raft organizes client requests into a sequence, called the log, and ensures that all the replicas see the same log. Each replica executes client requests in log order, applying the requests to its local copy of the service's state. Since all the live replicas see the same log contents, they all execute the same requests in the same order, and thus continue to have identical service state. If a replica fails but later recovers, Raft takes care of bringing its log up to date. Raft will continue to operate as long as at least a majority of the replicas are alive and can talk to each other. If there is no such majority, Raft will make no progress, but it will pick up where it left off and ensure correct operation as soon as a majority can communicate again. This greatly simplifies implementing a linearizable, replicated service.

In this lab, you'll implement Raft as a Go object type with associated methods that is meant to be used as a module by a replicated service. A set of Raft instances called peers communicate with each other using RPC to maintain replicated logs. Your Raft interface will support an indefinite sequence of numbered commands, also called log entries. In Raft, a log entry may or may not be committed. However, when your Raft implementation commits a log entry, it should send it to the replicated service for it to be executed.

You should follow the design in the extended Raft paper, with particular attention to Figure 2. You'll implement most of the design in the paper, including saving persistent state and reading it after a node fails and then restarts. However, you will not implement log compaction and cluster membership changes (Sections 7 and 6).

Starter Code

Update your local repository with our starter code.

The starter code for this lab is available in the raft1 directory. You only need to modify the raft1/raft.go file for this lab. This file provides some initial code and some code samples, e.g., how to send and receive RPCs. You may modify or add any other files for testing but we will test your code with our starter code and your modified version of the raft1/raft.go file.

After you have completed your implementation, you can test your code as follows:

cd raft
go test -run 3A
go test -run 3B
go test -run 3C

Implementation

Your implementation must support the following Raft interface, which our tester and your replicated key/value server in Lab 4 will use. You'll find more details in the raftapi/raftapi.go and raft1/raft.go files.

// create a new Raft server instance
rf := Make(peers, me, persister, applyCh)

// start agreement on a new log entry
rf.Start(command interface{}) (index, term, isLeader)

// ask a Raft peer for its current term, and whether it thinks it is leader
rf.GetState()(term, isLeader)

// each time a new entry is committed to the log, every Raft peer
// should send an ApplyMsg via applyCh to the key-value service (or tester).
type ApplyMsg

A service calls Make(peers,me,...) to create a Raft peer. The peers argument is an array of network identifiers of the Raft peers (including this one). A peer communicates with other peers using RPC. The me argument is the index of this peer in the peers array. Start(command) asks Raft to initiate the process of appending command to the replicated log. Start() should return immediately, without waiting for the log append to complete. The service expects each Raft peer to send an ApplyMsg for each newly committed log entry to the applyCh channel argument passed to Make().

The raft1/raft.go file contains example code that sends an RPC (sendRequestVote) and that handles an incoming RPC (RequestVote). Your Raft peers should exchange RPCs using the Go labrpc package (source in labrpc directory). The tester can tell labrpc to delay RPCs, re-order them, and discard them to simulate various network failures. While you can temporarily modify labrpc, make sure your Raft implementation works with the original labrpc, since that's what we'll use to test and grade your lab. Your Raft instances must only interact using RPC; for example, they are not allowed to communicate using shared Go variables or files.

Each part of the lab will take significant time and so start early so that you have enough time to complete the lab. The later parts of this lab build on the previous parts so make sure to write easy-to-understand, well-commented code.

Part 3A: Leader Election

Implement Raft leader election and heartbeats (AppendEntries RPCs with no log entries). The goal for Part 3A is to 1) elect a single leader at a time, 2) use heartbeats to ensure that the leader remains a leader when there are no failures, and 3) elect a new leader that takes over when the old leader fails or when packets to/from the old leader are lost.

Follow the paper's Figure 2. At this point you care about sending and receiving RequestVote RPCs, the Rules for Servers that relate to elections, and the State related to leader election. Add the Figure 2 state for leader election to the Raft struct in raft1/raft.go.

Fill in the RequestVoteArgs and RequestVoteReply structs. When a peer hasn't heard from the leader for a period of time (election timeout), start an election by sending out RequestVote RPCs. Make() creates a background ticker() goroutine, which use can use to start elections. Implement the RequestVote RPC handler so that the Raft peers will vote for one another.

To implement heartbeats, define an AppendEntries RPC struct (though you may not need all the arguments yet), and have the leader send AppendEntries RPCs periodically. Write an AppendEntries RPC handler method. The tester requires that the leader send heartbeat RPCs no more than tens of times per second. The tester also requires your Raft implementation to elect a new leader within five seconds of the failure of the old leader (if a majority of peers can still communicate). The paper's Sections 5.2 and 9.3 mention election timeouts in the range of 150 to 300 milliseconds. Such a range only makes sense if the leader sends heartbeats considerably more often than once per 150 milliseconds (e.g., once per 10 milliseconds). Since the tester limits you to tens of heartbeats per second, you will have to use an election timeout larger than the paper's 150 to 300 milliseconds, but not too large, because then you may fail to elect a leader within five seconds.

The tester calls your Raft's rf.Kill() when it is permanently shutting down an instance. You can check whether rf.Kill() has been called using rf.killed(). You should do this check in all loops, to avoid having dead Raft instances print confusing messages, and so that Raft can let clients know that it is being shutdown.

Don't forget to implement GetState().

Next, skip over the descriptions of Lab 3B and Lab 3C and go to our advice and following sections in this handout.

Part 3B: Raft Log

In this part of the lab, you will implement the leader and follower code to append new log entries, so that the go test -run 3B tests pass.

Start by implementing Start() and then write the code to send and receive new log entries via AppendEntries RPCs, following Figure 2. Send each newly committed entry to the applyCh channel on each peer. You'll need to define a struct to hold information about each log entry.

You will need to implement the election restriction (section 5.4.1 in the paper).

Your code may have loops that repeatedly check for certain events. Don't have these loops execute continuously without pausing, since that will slow your implementation enough that it fails tests. Use Go's condition variables or insert a time.Sleep(10 * time.Millisecond) in each loop iteration.

Next, skip over the description of Lab 3C and go to our advice and following sections in this handout.

Part 3C: Persistence

If a Raft-based server reboots it should resume service where it left off. This requires that Raft keep persistent state that survives a reboot. The paper's Figure 2 mentions which state should be persistent.

A real implementation would write Raft's persistent state to disk each time it changed and would read the state from disk when restarting after a reboot. Your implementation won't use the disk; instead, it will save and restore persistent state using a Persister object (see tester1/persister.go). Whoever calls Raft.Make() supplies a Persister that initially holds Raft's most recently persisted state (if any). Raft should initialize its state from that Persister, and use it to save its persistent state each time the state changes. Use the Persister's ReadRaftState() and Save() methods.

Complete the functions persist() and readPersist() in raft1/raft.go by adding code to save and restore persistent state. You will need to encode (or "serialize") the state as an array of bytes in order to pass it to the Persister. Use the labgob encoder; see the comments in persist() and readPersist() in raft1/raft.go. labgob is like Go's gob encoder but prints error messages if you try to encode structures with lower-case field names. Pass nil as the second argument to persister.Save(). Insert calls to persist() at the points where your implementation changes persistent state. Once you've done this, and if the rest of your implementation is correct, you should pass all of the 3C tests.

You will probably need the optimization that backs up nextIndex by more than one entry at a time. Look at the extended Raft paper starting at the bottom of page 7 and top of page 8 (marked by a gray line). The paper is vague about the details; you will need to fill in the gaps. While there are several options, one option is to have a failed message include:

XTerm:  term in the conflicting entry (if any)
XIndex: index of first entry with that term (if any)
XLen:   log length

Then the leader's logic can be something like:

Case 1: leader doesn't have XTerm:
  nextIndex = XIndex
Case 2: leader has XTerm:
  nextIndex = leader's last entry for XTerm
Case 3: follower's log is too short:
  nextIndex = XLen

Advice

Testing

We supply a set of tests, which you should use to drive your implementation efforts, and which we'll use to grade your submitted lab. The tests are in raft1/raft_test.go. You can't easily run your Raft implementation directly. Instead you should run it by way of the tester.

If you wish to run individual tests, look for function names starting with Test, e.g., TestInitialElection3A in raft1/raft_test.go. Then run the individual test as follows:

go test -run TestInitialElection3A

If your code has trouble passing the tests, read the paper's Figure 2 again; the full logic for leader election is spread over multiple parts of the figure.

If you fail a test, look at raft1/raft_test.go and raft1/test.go to understand what's being tested. raft1/test.go also illustrates how the tester uses the Raft API.

When we grade your submissions, we will run the tests without the race flag. However, you should make sure to run your code with this flag (go test -race) and fix any races it reports.

Testing Lab 3A

You have completed Lab 3A when your code passes the 3A tests in the test suite.

$ go test -run 3A
Test (3A): initial election (reliable network)...
  ... Passed --  time  3.1s #peers 3 #RPCs    62 #Ops    0
Test (3A): election after network failure (reliable network)...
  ... Passed --  time  4.6s #peers 3 #RPCs   124 #Ops    0
Test (3A): multiple elections (reliable network)...
  ... Passed --  time  5.5s #peers 7 #RPCs   588 #Ops    0
PASS
ok      ece419/raft1    13.217s

Testing Lab 3B

You have completed Lab 3B when your code passes the 3B tests in the test suite.

$ go test -run 3B
Test (3B): basic agreement (reliable network)...
  ... Passed --  time  0.4s #peers 3 #RPCs    16 #Ops    0
Test (3B): RPC byte count (reliable network)...
  ... Passed --  time  1.2s #peers 3 #RPCs    48 #Ops    0
Test (3B): test progressive failure of followers (reliable network)...
  ... Passed --  time  4.4s #peers 3 #RPCs   118 #Ops    0
Test (3B): test failure of leaders (reliable network)...
  ... Passed --  time  5.1s #peers 3 #RPCs   196 #Ops    0
Test (3B): agreement after follower reconnects (reliable network)...
  ... Passed --  time  3.3s #peers 3 #RPCs    90 #Ops    0
Test (3B): no agreement if too many followers disconnect (reliable network)...
  ... Passed --  time  3.3s #peers 5 #RPCs   200 #Ops    0
Test (3B): concurrent Start()s (reliable network)...
  ... Passed --  time  0.6s #peers 3 #RPCs    22 #Ops    0
Test (3B): rejoin of partitioned leader (reliable network)...
  ... Passed --  time  3.7s #peers 3 #RPCs   146 #Ops    0
Test (3B): leader backs up quickly over incorrect follower logs (reliable network)...
  ... Passed --  time 14.9s #peers 5 #RPCs  2056 #Ops    0
Test (3B): RPC counts aren't too high (reliable network)...
  ... Passed --  time  2.1s #peers 3 #RPCs    68 #Ops    0
PASS
ok      ece419/raft1    39.031s

The ok ece419/raft 39.031 means that Go measured the time taken for the 3B tests to be 39.031 seconds of real (wall-clock) time. The tests for the later labs may fail if your Lab 3B tests take more than 1 minute of real time.

You can also check the CPU time taken by your solution by running:

/usr/bin/time go test -run 3B

The (user-mode) CPU time should be less than 5 seconds. Otherwise, look for time spent sleeping, waiting for RPC timeouts, loops that run without sleeping or waiting for conditions or channel messages, or large numbers of RPCs sent.

Testing Lab 3C

You have completed Lab 3C when your code passes the 3C tests in the test suite.

$ go test -run 3C
Test (3C): basic persistence (reliable network)...
  ... Passed --  time  3.0s #peers 3 #RPCs    74 #Ops    0
Test (3C): more persistence (reliable network)...
  ... Passed --  time 11.0s #peers 5 #RPCs   420 #Ops    0
Test (3C): partitioned leader and one follower crash, leader restarts (reliable network)...
  ... Passed --  time  1.2s #peers 3 #RPCs    34 #Ops    0
Test (3C): Figure 8 (reliable network)...
  ... Passed --  time 29.5s #peers 5 #RPCs  1088 #Ops    0
Test (3C): unreliable agreement (unreliable network)...
  ... Passed --  time  1.2s #peers 5 #RPCs  1036 #Ops    0
Test (3C): Figure 8 (unreliable) (unreliable network)...
  ... Passed --  time 31.8s #peers 5 #RPCs  9424 #Ops    0
Test (3C): churn (reliable network)...
  ... Passed --  time 16.2s #peers 5 #RPCs 11024 #Ops    0
Test (3C): unreliable churn (unreliable network)...
  ... Passed --  time 16.4s #peers 5 #RPCs  5328 #Ops    0
PASS
ok      ece419/raft1    110.372s

Checking submission

It is a good idea to run the tests multiple (e.g., 10) times and check that each run prints PASS before submitting your code for each lab.

for i in {0..10}; do go test -run 3A; done

We have provided you a tool that allows you to check your lab implementation and estimate your lab grade. After setting your path variable, you can run the tool in the raft1 directory as follows:

ece419-check [lab3A|lab3B|lab3C]

You can check the output of the tool in the test.log file. Note that an individual test will fail if it takes more than 120 seconds.

Submission

We will use the following files from your remote repository for grading: raft1/raft.go.

Please see lab submission.

Acknowledgements

Prof. Robert Morris, Prof. Frans Kaashoek, and Prof. Nickolai Zeldovich at MIT developed these labs. We have their permission to use them for this course.