Distributed Systems

ECE419, Winter 2025
University of Toronto
Instructor: Ashvin Goel

Lab Machines Lab Setup Lab Submission Lab 1 Lab 2 Lab 3 Lab 4

Lab 4: Fault-tolerant Key/Value Service

Due date: Apr 7

This is an optional, bonus lab.

In this lab you will build a linearizable, replicated key/value storage service using your Raft library from Lab 3. Your key/value service will be a replicated state machine, consisting of several key/value servers that each maintain a database of key/value pairs, as in Lab 2, but additionally use Raft for replication.

This lab has one part (4A). The submission deadline for this part is shown above.

Starter Code

You can get the latest starter code for this lab from the student repository by running:

cd ece419
git pull upstream main

For more details, see our previous instructions on obtaining the starter code.

The starter code for this lab is available in the kvraft directory. You only need to modify the kvraft/client.go, kvraft/common.go, and kvraft/server.go files for this lab. These files provides some initial code and some code samples, e.g., how to get messages from your Raft library. You may modify any other files for testing but please run your final tests with the original versions of these files before submission. Also, do not add any new or remove existing files.

After you have completed your implementation, you can test your code as follows:

cd kvraft
go test -run 4A

Overview

Your replicated key/value service should continue to process client requests as long as a majority of the servers are alive and can communicate, in spite of other failures or network partitions.

Clients will interact with your key/value service in much the same way as in Lab 2. In particular, clients can send three different RPCs to the key/value service:

Get(key): fetches the current value of the key (returning the empty string for non-existent keys)
Put(key, value): replaces the value for a particular key in the database
Append(key, arg): appends arg to key's value (treating the existing value as an empty string if the key is non-existent)

Keys and values are strings. Note that unlike in Lab 2, neither Put nor Append should return a value to the client. Each client talks to the service via a Clerk with Get/Put/Append methods. The Clerk manages RPC interactions with the servers.

Your service must ensure that application calls to Clerk's Get/Put/Append methods are linearizable. If called one at a time, the Get/Put/Append methods should act as if the system had only one copy of its state, and each call should observe the modifications to the state implied by the preceding sequence of calls. For concurrent calls, the return values and final state must be the same as if the operations had executed one at a time in some order. Calls are concurrent if they overlap in time. For example, if Client X calls Clerk.Put() and then Client Y calls Clerk.Append() before Client X's call returns, then the two calls are concurrent. For linearizability, a call must observe the effects of all calls that have completed before the call starts.

Providing linearizability is relatively easy for a single server. It is harder if the service is replicated, since all servers must choose the same execution order for concurrent requests, must avoid replying to clients using state that isn't up to date, and must recover their state after a failure in a way that preserves all acknowledged client updates.

For this lab, you should review the extended Raft paper, in particular Section 8. After this lab, you will have implemented all parts (Clerk, Service, and Raft) shown in the diagram of Raft interactions except for snapshots.

Interaction between key/value servers and Raft

Each of your key/value servers will have an associated Raft peer. Clerks send Get(), Put() and Append() RPCs to the server whose associated Raft peer is the leader. The server submits the Get/Put/Append operation to Raft so that Raft can log and commit these operations. After Raft commits these operations and the Raft peers deliver them to their corresponding servers, the servers apply these operations in the same order to their key/value databases. The intent is for the servers to maintain identical replicas of the key/value database. When the server that received a Get/Put/Append RPC request applies the operation to its database, it reports the result to the Clerk by responding to the RPC.

A Clerk sometimes doesn't know which server is the Raft leader. If the Clerk sends an RPC to the wrong server, or if it cannot reach the server, the Clerk should retry by sending to a different server. If an operation failed to commit (for example, if the leader was replaced), the server reports an error to the Clerk, which then retries with a different server.

Your servers should not directly communicate; they should only interact with each other through Raft.

Replicated key/value server with no network or server failures

Your first task is to implement a solution that works when there are no dropped messages and no failed servers.

Feel free to copy over your client code from Lab 2 (kvsrv/client.go) into kvraft/client.go. You will need to add logic that decides the server to which RPCs should be sent.

You'll also need to implement Get(), Put() and Append() RPC handlers in kvraft/server.go. These handlers should enter an Op in the Raft log using Start(). You should fill in the Op struct definition in kvraft/server.go so that it describes a Get/Put/Append operation. Each server should execute Op commands as Raft commits them, i.e. as Raft delivers them to applyCh. An RPC handler should notice when Raft commits its Op, and then reply to the RPC.

You have completed this task when you reliably pass the first test ("one client" or TestBasic4A) in the test suite.

After a server calls Start(), all servers will need to wait for Raft to complete agreement. Commands that have committed arrive on the applyCh channel. Your code will need to keep reading from applyCh while Get(), Put(), an Append() handlers submit commands to the Raft log using Start(). Beware of deadlock between the server and its Raft library. It's best to add locking from the start because the need to avoid deadlocks sometimes affects overall code design. Check that your code is race-free using go test -race.

A server should not complete a Get() RPC if it is not part of a majority (so that it does not serve stale data). A simple solution is to enter every Get() (as well as each Put() and Append()) in the Raft log. You don't have to implement the optimization for read-only operations that is described in Section 8.

Replicated key/value server with network or server failures

Now you should modify your solution to continue in the face of network and server failures.

One problem you'll face is that a Clerk may have to send an RPC multiple times until it finds a server that replies positively. If a leader fails just after committing an entry to the Raft log, the Clerk may not receive a reply, and thus may resend the request to another leader. Each call to Clerk.Put() or Clerk.Append() should result in just a single execution, so you will have to ensure that the resend doesn't result in the servers executing the request twice.

Add code to handle failures, and to cope with duplicate Clerk requests, including situations where the Clerk sends a request to a leader server in one term, times out waiting for a reply, and resends the request to a new leader server in another term. The request should execute just once. For duplicate detection, you should use the same methods as in Lab 2. Your code should free server memory quickly, for example by having each RPC imply that the client has seen the reply for its previous RPC. It's OK to assume that a client will make only one call to a Clerk at a time. You may find that you need to make changes to what information you store in your duplicate detection table from Lab 2.

Your solution needs to handle a leader that has called Start() for a Clerk's RPC, but loses its leadership before the request is committed to the log. In this case, the superceded leader should report an error to the Clerk, which should then resend the request to other servers until it finds the new leader. A server can determine that it has lost leadership when it notices that Raft's term has changed or a different request has appeared at the index returned by Start(). If the superseded leader is partitioned by itself, it won't know about new leaders. A client in the same partition won't be able to talk to a new leader either, so it's OK in this case for the server and client to wait indefinitely until the partition heals.

You will probably have to modify your Clerk to remember which server turned out to be the leader for the last RPC, and send the next RPC to that server first. This will avoid wasting time searching for the leader on every RPC, which may help you pass some of the tests.

Testing

Your code should now pass the Lab 4A tests, like this:

$ go test -run 4A
Test: one client (4A) ...
  ... Passed --  15.0  5 27928 4452
Test: ops complete fast enough (4A) ...
  ... Passed --   1.1  3  5682    0
Test: many clients (4A) ...
  ... Passed --  15.2  5 32445 5010
Test: unreliable net, many clients (4A) ...
  ... Passed --  15.8  5 11207 1571
Test: concurrent append to same key, unreliable (4A) ...
  ... Passed --   0.9  3   244   52
Test: progress in majority (4A) ...
  ... Passed --   0.4  5    51    2
Test: no progress in minority (4A) ...
  ... Passed --   1.0  5   111    3
Test: completion after heal (4A) ...
  ... Passed --   1.0  5    61    3
Test: partitions, one client (4A) ...
  ... Passed --  22.5  5 28234 3707
Test: partitions, many clients (4A) ...
  ... Passed --  22.6  5 97835 5563
Test: restarts, one client (4A) ...
  ... Passed --  18.4  5 28007 4493
Test: restarts, many clients (4A) ...
  ... Passed --  18.9  5 70555 4835
Test: unreliable net, restarts, many clients (4A) ...
  ... Passed --  19.5  5 12218 1626
Test: restarts, partitions, many clients (4A) ...
  ... Passed --  26.1  5 60645 5352
Test: unreliable net, restarts, partitions, many clients (4A) ...
  ... Passed --  26.9  5 11092 1292
Test: unreliable net, restarts, partitions, random keys, many clients (4A) ...
  ... Passed --  27.9  7 32470 2795
PASS
ok      ece419/kvraft   233.396s

The numbers on each Passed line are the real time the test took in seconds, the number of Raft peers, the number of RPCs sent during the test (including client RPCs), and the number of key/value operations executed (Clerk Get/Put/Append calls).

The tests in this lab are more demanding than the Lab 3 tests. You may have bugs in your Raft library that are exposed by the tests in this lab. If you make changes to your Raft implementation make sure it continues to pass all of the Lab 3 tests.

You should not need to add any fields to to the Raft ApplyMsg structure, or to Raft RPCs such as AppendEntries, but you can do so.

If you fail a test, look at kvraft/test_test.go and kvraft/config.go to understand what's being tested.

When we grade your submissions, we will run the tests without the race flag. However, you should make sure to run your code with this flag (go test -race) and fix any races it reports.

For this lab, we have provided you a tool that allows you to estimate your lab grade. After setting your path variable, you can run the tool in the kvraft directory as follows:

ece419-lab4-check A

Submission

We will use the following files from your code repository for grading: raft/raft.go, kvraft/client.go, kvraft/common.go, and kvraft/server.go.

Please see lab submission.

Acknowledgements

Prof. Robert Morris, Prof. Frans Kaashoek, and Prof. Nickolai Zeldovich at MIT developed these labs. We have their permission to use them for this course.