Distributed Systems

ECE419, Winter 2026
University of Toronto
Instructor: Ashvin Goel

    Distributed Systems

Lab 2: Key-Value Server

Due date: Feb 8

In this lab you will build a key-value server that runs on a single machine and ensures that all the operations are linearizable and the update operations are executed at-most once despite network failures. Later labs will replicate a server like this one to handle server crashes.

Overview

A client communicates with the key-value server using two RPCs: Put(key, value, version), and Get(key). The server maintains an in-memory map that records for each key a (value, version) tuple. Both keys and values are strings. The version number records the number of times the key has been written.

Put(key, value, version) installs or replaces the value of a particular key in the in-memory map conditionally if the Put() version number matches the version number stored by the server for the key. If the version numbers match, the server also increments the stored version number of the key. If the version numbers don't match, the server returns rpc.ErrVersion. A client creates a new key by invoking Put() with version number 0 and the resulting version stored by the server is 1. If the version number of Put() is larger than 0 and the key doesn't exist, the server returns rpc.ErrNoKey.

Get(key) fetches the current value of the key and its associated version. If the key doesn't exist at the server, the server returns rpc.ErrNoKey.

Linearizability

Your server must ensure that the client's calls to the Get and Put RPCs are linearizable. If client requests are called one at a time (aren't overlapping/concurrent), each Get and Put call should observe the modifications to the state implied by the preceding sequence of calls (as if the system has only one copy of its state). For concurrent calls, the return values and final server state must be the same as if the operations had executed one at a time in some order. Calls are concurrent if they overlap in time. For example, if Client X calls Put and then Client Y calls Put before Client X's call returns, then the two calls are concurrent. For linearizability, a call must observe the effects of all calls that have completed before the call starts.

Linearizability is convenient for applications because it's the behavior you'd see from a single server that processes requests one at a time. For example, if one client gets a successful response from the server for an update request, subsequently launched reads from other clients are guaranteed to see the effects of that update. Providing linearizability is relatively easy for a single server.

Please see more details on linearizability.

Starter Code

Update your local repository with our starter code.

The starter code for this lab is available in the kvsrv1 and kvsrv1/lock directories. kvsrv1/client.go contains the client code, including a Clerk that implements the client side of RPC requests. kvsrv1/server.go contains the server code, including the handlers that implement the server side of RPC requests. The RPC requests, replies, and error values are defined in kvsrv1/rpc/rpc.go. kvsrv1/lock/lock.go contains locking-related code.

You need to modify the kvsrv1/client.go, kvsrv1/server.go and kvsrv1/lock/lock.go files for this lab. You may modify or add any other files for testing but we will test your code with our starter code and your modified versions of these files.

After you have completed your implementation, you can test your code as follows:

cd kvsrv1
go test
cd kvsrv1/lock
go test

Implementation

The implementation for this lab consists of four parts as described below.

Key-value server with reliable network

Your first task is to implement a solution that works over a reliable network that does not drop, reorder or delay messages.

You'll need to add RPC-sending code to the Clerk Get() and Put() methods in client.go, and implement Get() and Put() RPC handlers in server.go.

You have completed this task when your code passes the Reliable tests in the kvsrv1 test suite.

$ cd kvsrv1
$ go test -run Reliable
One client and reliable Put (reliable network)...
  ... Passed --  time  0.0s #peers 1 #RPCs     5 #Ops    0
Test: many clients racing to put values to the same key (reliable network)...
  ... Passed --  time  7.6s #peers 1 #RPCs 90227 #Ops 90227
Test: memory use many put clients (reliable network)...
  ... Passed --  time  5.5s #peers 1 #RPCs 100000 #Ops    0
PASS
ok      ece419/kvsrv1   16.569s

The numbers after each Passed are the real time in seconds, the number of peers, the number of RPCs sent (including client RPCs), and the number of key-value operations executed (Clerk Get() and Put() calls).

Lock service using key-value server

In many distributed applications, clients running on different machines use a key-value server to coordinate their activities. For example, ZooKeeper and Etcd allow clients to coordinate using a distributed lock, in analogy with how threads in a Go program can coordinate with locks (i.e., sync.Mutex). Zookeeper and Etcd implement such a lock using a conditional put, similar to the Put implementation of your key-value server.

In this exercise your task is to implement a lock using the Clerk.Get() and Clerk.Put() calls. The lock supports two methods: Acquire and Release. The lock's specification is that only one client can successfully acquire the lock at a time; other clients must wait until the first client has released the lock.

We supply you with skeleton code and tests in kvsrv1/lock. You will need to modify the kvsrv1/lock/lock.go file. Your Acquire and Release code can talk to your key-value server by calling lk.ck.Get() and lk.ck.Put().

If a client crashes while holding a lock, the lock will never be released. In a more sophisticated design, the client would attach a lease to a lock. When the lease (timeout) expires, the lock server would release the lock on behalf of the client. In this lab, clients don't crash and so you can ignore this problem.

This exercise requires little code but will require a bit more thought compared to the previous exercise. The lock service should use a specific key to store the value of the lock state. This key is passed through the parameter l of MakeLock in kvsrv1/lock/lock.go. You will have to think about the lock state that needs to be stored.

You have completed this task when your code passes the Reliable tests in the lock test suite.

$ cd kvsrv1/lock
$ go test -run Reliable
Test: 1 lock clients (reliable network)...
  ... Passed --  time  2.0s #peers 1 #RPCs  1058 #Ops    0
Test: 10 lock clients (reliable network)...
  ... Passed --  time  2.1s #peers 1 #RPCs 239329 #Ops    0
PASS
ok      ece419/kvsrv1/lock  4.112s

Key-value server with unreliable network

Now you should implement a key-value solution that works over an unreliable network that may drop, reorder or delay messages (e.g., RPC requests and RPC replies). If a message is lost, then the client's ck.clnt.Call() will return false (more precisely, Call() waits for a reply message for a timeout interval, and returns false if no reply arrives within that time). To recover from dropped requests/replies, the Clerk must keep re-trying each RPC until it receives a reply from the server. Before the client retries, it should wait a little bit; you can use Go's time package and call time.Sleep(100 * time.Millisecond). Your solution shouldn't require any changes to the server.

If the network drops an RPC request message, then re-sending the request will ensure that the server receives and executes the re-sent request.

If the network drops an RPC reply message, then re-sending the request will cause the server to receive a duplicate copy of the request. A duplicate request is fine for a Get request since it does not modify state. It is also fine for a Put request since the server executes it conditionally on the version number, i.e., if the server received and executed a Put request, it will respond to a re-transmitted copy of that request with rpc.ErrVersion rather than executing the Put a second time, which ensures at-most once semantics for updates. However, since the first RPC reply message was dropped, the Clerk does not know for certain if its Put was executed by the server. The server may have executed the first Put request, or it is also possible that another Clerk updated the key before the first Put request arrived at the server, and so the server executed neither of the Put requests and replied rpc.ErrVersion both times.

Therefore, if a Clerk receives rpc.ErrVersion for a re-transmitted Put request, the Clerk must return rpc.ErrMaybe to the application instead of rpc.ErrVersion since the request may (or may not) have been executed. It is then up to the application to handle this case. However, if the server responds to an initial (not retransmitted) Put request with rpc.ErrVersion, then the Clerk should return rpc.ErrVersion to the application, since the request was definitely not executed by the server.

It would be more convenient for application developers if the key-value server guaranteed exactly-once semantics since the application would not have to handle rpc.ErrMaybe errors but that requires maintaining state at the server for each Clerk, which can become expensive with many clients.

You have completed this task when your code passes all the tests in the kvsrv1 test suite.

$ cd kvsrv1
$ go test
One client and reliable Put (reliable network)...
  ... Passed --  time  0.0s #peers 1 #RPCs     5 #Ops    0
Test: many clients racing to put values to the same key (reliable network)...
  ... Passed --  time  2.5s #peers 1 #RPCs 92449 #Ops 92449
Test: memory use many put clients (reliable network)...
  ... Passed --  time  5.4s #peers 1 #RPCs 100000 #Ops    0
One client (unreliable network)...
  ... Passed --  time  9.2s #peers 1 #RPCs   271 #Ops  215
PASS
ok      ece419/kvsrv1   20.596s

Lock service using key-value server with unreliable network

Now you should update the lock implementation to work with your updated key-value client when the network is unreliable. Your lock implementation must handle the at-most once semantics of the key-value service. You may need a unique identifier for each lock client. You can use kvtest.RandValue(8) to generate a random string.

You have completed this task when your code passes all the tests in the lock test suite.

$ cd kvsrv1/lock
$ go test
Test: 1 lock clients (reliable network)...
  ... Passed --  time  2.0s #peers 1 #RPCs  1058 #Ops    0
Test: 10 lock clients (reliable network)...
  ... Passed --  time  2.1s #peers 1 #RPCs 240517 #Ops    0
Test: 1 lock clients (unreliable network)...
  ... Passed --  time  2.5s #peers 1 #RPCs    67 #Ops    0
Test: 10 lock clients (unreliable network)...
  ... Passed --  time  3.8s #peers 1 #RPCs   817 #Ops    0
PASS
ok      ece419/kvsrv1/lock  10.362s

Advice

Testing

If you wish to run individual tests, look for function names starting with Test, e.g., TestReliablePut in kvsrv_test.go. Then run the individual test as follows:

go test -run TestReliablePut

Checking submission

We have provided you a tool that allows you to check your lab implementation and estimate your lab grade. After setting your path variable, you can run the tool in the kvsrv1 directory as follows:

ece419-check lab2

You can check the output of the tool in the test.log file. Note that an individual test will fail if it takes more than 120 seconds.

Submission

We will use the following files from your remote repository for grading: kvsrv1/client.go, kvsrv1/server.go and kvsrv1/lock/lock.go.

Please see lab submission.

Acknowledgements

Prof. Robert Morris, Prof. Frans Kaashoek, and Prof. Nickolai Zeldovich at MIT developed these labs. We have their permission to use them for this course.