Advances in Distributed Systems
ECE 1746, Fall 2003
University of Toronto

Instructor: Ashvin Goel

Course time: Thursday, 4-6pm
Course Room: Galbraith Building, GB-120
Start date: Sep 11, 2003

Course Description

The exponential growth of Internet services demonstrates the importance and potential of large-scale distributed systems. Today, Web services allow online shopping of virtually any product from cheap second-hand items to expensive art collections. Content delivery networks can potentially speed these services by cleverly caching Web pages. Peer-to-peer applications allow sharing of content in ways that are making industry nervous about their profit margins. Multimedia services provide streaming delivery of audio and video. The new classes of distributed applications that are becoming ubiquitous seems endless: cluster computing, grid computing, game services, pervasive mobile computing, sensor networks, etc. In this scenario, a fundamental challenge is to provide scalable and robust services in the presence of best-effort communication and unreliable nodes.

This graduate-level course focuses on distributed computing from a systems software perspective. Students are expected to read and critique recent research papers that cover some of the distributed applications mentioned above and span areas such as operating systems, networks and multimedia systems. They are also expected to work on a research project and make a presentation.


There are no required textbooks for this course. The optional textbook is Distributed Systems: Concepts and Design (Third Edition), by George Coulouris, Jean Dollimore and Tim Kindberg. Published by Addison-Wesley, 2001. ISBN 0-201-61918-0.

Class Format

Please read carefully

Each week this class will cover a group of papers that focuses on a specific aspect of distributed systems. Students are expected to read all the papers in the group that will be presented (the number of presentations depends on the number of students in class). At the beginning of the term, each paper will be assigned to a student who will be presenting the paper. Presentations will be limited to 15 minutes.

While students are welcome to present papers as they wish, here is an outline of a presentation that should help you get started. If students use slides, please use at least 20-24 point font for text. For a 15 minute presentation, do not use more than 15 slides or else the presentation will appear rushed. Students are welcome to send slides to the instructor a week before the presentation to get additional help.

After the presentation the student is expected to lead a 20-30 minute in-depth discussion of the paper (the length of the discussion will depend on the number of students in the class). This discussion should aim to answer the following questions: To aid in this discussion, each student presentation must end with a list of 5 specific questions that the student can ask other students and should be prepared to answer (the student should preferably have the answers at the end of the slides).

The answers to these questions should not be obvious, i.e. they should not be stated clearly in the paper. Instead, the questions should help in critial analysis of the paper. For example, suppose one of the stated contributions of the paper is that it "Enables secure peer-to-peer routing". One question might be: how secure is the routing and what strategy discussed in the paper makes it secure?

Choosing Papers for Presentation

After the first class in the term, each student should send mail to the instructor with a list of three or more papers that the student would like to present in class. The list of papers is available on the class web site and is broken by subject and the week in which the paper will be presented.

The paper choice is first-come, first served. So it is in the student's interest to send mail to the instructor soon so that they get the first paper of their choice.

The reason for sending additional papers is to resolve conflicts so that if a student doesn't get their first choice, then their second choice can be assigned to them, etc. It is better to send a long list than a short list since if all the papers in the list are taken, then the instructor will have to send the student mail asking for another set of papers from the student. By the time, the student sends the next mail, other students may have chosen many more papers! We are solving a little distributed consistency problem here!

The papers are available under reading list. The first three papers in each group will generally be required reading while the later papers will generally be optional. The number of required readings in each group depends on the number of students in the class. The instructor will inform students which papers they should be choosing. However, it is best to choose the first three papers in a group as the first three choices in your list for your presentation. Also, papers that are already listed on the main class page (under each week) are already taken. Don't choose them. Since those papers are being presented, they are required reading.

Paper Reading List

Available under reading list.

Grading Policy

Grades will be based on class presentation and the questions prepared for the discussion, class project, quizzes and class participation and discussion. There will be no final exam in this course. There are no assignments for students who attend all classes. The grading breakup is as follows: Note: If you unable to attend a class, you will have to submit an assignment to me. Please see the quiz format below for more details.

Quiz Format

Too often, in a seminar class like this one, students do not read material or skip the presentations on topics other than the one they're scheduled to present. To discourage this attitude, the instructor will conduct four short quizzes during the semester. Each quiz will count 5% towards the final grade.

Here are some of the salient features of these quizzes:
If you are unable to attend a class, you should submit an assignment to me that summarizes the papers that will be presented in class that week. The summary for each paper should be one paragraph long, and it should state the topic of the paper, the contributions or the novel ideas in the paper and the results of the paper. Do not write more than 3-5 sentences per paragraph.

You should submit the assignment to me by email in a text file (not word or PDF file) before the end of class, i.e. I should get the mail by 6:00 pm Thursday. If there is no quiz, then I will ignore the assignment. However, if there is a quiz, I will read your assignment. Each such assignment will have the same points as a quiz or 5%.

Project Format

A major component of this course is devoted to a term-long project. The topic of the final project is largely up to you, but to help you choose a project, a list of projects is described below. These projects should help students determine whether their own projects are of reasonable size and scope.

The goal of the project is to encourage students to explore some aspect of distributed systems in detail. Some guidelines for choosing a project are: 1) the work should be in an area related to distributed systems (e.g., look at the topics for each week), 2) the work should be completed in less than three months, and 3) talk to the instructor and get a verbal agreement about a project before committing to it.

Students have two project options: 1) design and implementation of a system, or 2) writing a position paper. For the implementation option, 2-4 students should collaborate on the project. Make sure that the project is structured so that you can evaluate the system quantitatively. This option has the deliverables described below. Each of these deliverables is per-project (and not per-student). Note that each future deliverable contains much of the contents of the previous deliverables.
  1. Project Description: 1 page (Due Oct 2, 2003)

  2. Status Report: 3-4 pages (Due Oct 30, 2003)

  3. Final Report: 8-10 pages (Due Dec 4, 2003)

The second option, the position paper option, is for individuals. Students should pick an area of distributed systems such as the topics discussed each week. First, they should conduct detailed background research and cover as much literature as possible. Then they should compare the approaches and discuss the benefits or drawbacks of each. Finally they should come up with their "position". Your position should be a novel statement based on solid background research and sound judgement that you articulate clearly. Your position should not be obvious from the papers or background research. In other words, the position paper option encourages research (and not just a survey of previous work). Since there is no implementation with this option, the grading will be stricter regarding the quality of the final report and the novelty of your idea.

There are three main differences in the deliverables with this option compared to the implementation option: 1) since implementation and evaluation will not exist, you don't have to include it, 2) the background research should be more thorough and 3) the focus of the paper should be on the details of your approach which should clearly justify your position, i.e. your novel statement. Think of this option as a proposal for your research. If you are already conducting research in an area that is somewhat related to distributed systems, this option is a great way to force yourself to put your thoughts clearly on paper. If you are not conducting research yet, it will help you get started.

Based on the number of students in the class, the instructor will decide later whether there will be short project presentations.

Project Suggestions

Available under project suggestions.


Available under projects.

Weekly Reading List

  1. Introduction

    Introduction by Instructor

    Efficient Readings of Papers in Science and Technology
    Michael J. Hanson, Dylan J. McNamee

    How (and How Not) to Write a Good Systems Paper
    Roy Levin, David D. Redell, Operating Systems Review 17(3), July 1983.
    Paper Additional Material
  2. Fault Tolerance

    Understanding Fault-Tolerant Distributed Systems
    Flavin Cristian, CACM Feb 1991
    Paper Presenter: Jason Yuen

    Exploring Failure Transparency and the Limits of Generic Recovery
    David E. Lowell, Subhachandra Chandra, Peter M. Chen, OSDI 2000
    Paper Presenter: Ivan Matosevic

    Myriad: Cost-effective Disaster Tolerance
    Fay Chang, Minwen Ji, Shun-Tak A. Leung, John MacCormick, Sharon E. Perl, Li Zhang, FAST 2000
    Paper Presenter: Charles Zhang

  3. Security and Denial of Service

    Practical Network Support for IP Traceback
    Stefan Savage, David Wetherall, Anna Karlin, Tom Anderson, SIGCOMM 2000
    Paper Presenter: Aron Brener

    Backtracking Intrusions
    Samuel T. King, Peter M. Chen, SOSP 2003
    Paper Presenter: Idon Wong

    Terra: A Virtual-Machine Based Platform for Trusted Computing
    Tal Garfinkel, Ben Pfaff, Jim Chow, Mendel Rosenblum, Dan Boneh, SOSP 2003
    Paper Presenter: Levon Stepanian

  4. Naming Schemes

    Active Names: Flexible Location and Transport of Wide-Area Resources
    Amin Vahdat, Michael Dahlin, Thomas Anderson, Amit Aggarwal, USITS 1999
    Paper Presenter: Andrés Lagar Cavilla

    On the Effectiveness of DNS-based Server Selection
    Anees Shaikh, Renu Tewari, Mukesh Agrawal, INFOCOM 2001
    Paper Presenter: Katherine Lam

    Instructor's notes:
    Naming, Location Services and Binding

  5. Distributed File Systems

    The Google File System
    Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, SOSP 2003
    Paper Presenter: Borys Bradel

    Petal: Distributed Virtual Disks
    Edward K. Lee, Chandramohan A. Thekkath, ASPLOS 1996
    Paper Presenter: Kurniadi Asrigo

  6. Routing

    Enabling Conferencing Applications on the Internet using an Overlay Multicast Architecture
    Yang-Hua Chu, Sanjay G. Rao, Srinivasan Seshan, Hui Zhang, SIGCOMM 2001
    Paper Presenter: Ali Tizghadam

    Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications
    Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, SIGCOMM 2001
    Paper Presenter: Martin Labrecque

  7. P2P Storage

    Protecting Free Expression Online with Freenet
    Ian Clarke, Theodore W. Hong, Scott G. Miller, Oskar Sandberg, and Brandon Wiley, IEEE Internet Computing 2002
    Paper Presenter: Kamran Farhadi

    Wide-Area Cooperative Storage With CFS
    Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, SOSP 2001
    Paper Presenter: Catalin Drula

  8. P2P Search and Applications

    Querying the Internet with PIER
    Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica, VLDB 2003
    Paper Presenter: Henry Luk

    Distributed Query Processing and Catalogs for Peer-to-Peer Systems
    Vassilis Papadimos, David Maier, Kristin Tufte, CIDR 2003
    Paper Presenter: Taimur Javed

  9. Web Caching and Content Delivery Networks

    Internet Indirection Infrastructure
    Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana, SIGCOMM 2002
    Paper Presenter: Kai Yi Kenneth Po

    FastReplica: Efficient Large File Distribution Within Content Delivery Networks
    Ludmila Cherkasova, Jangwon Lee, USITS 2003
    Paper Presenter: Gokul Soundararajan

  10. Cluster-based Computing and Scalable Internet Services

    Cluster-Based Scalable Network Services
    Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier, SOSP 1997
    Paper Presenter: Matt Medland

    Capriccio: Scalable Threads for Internet Services
    Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula, Eric Brewer, SOSP 2003
    Paper Presenter: Kirk Stewart

  11. Replication and Grid Computing

    Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System
    D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, C. H. Hauser, SOSP 95
    Paper Presenter: Stanley Fung

    The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration
    I. Foster, C. Kesselman, J. Nick, S. Tuecke, GGF 2002 (Global Grid Forum)
    Paper Presenter: HungJu Tze

    Instructor's notes:
    Recovery in databases with undo and redo logging
    Borrowed from computer science course at Duke University (CPS 216)

  12. Sensor Networks

    System Architecture Directions for Networked Sensors
    Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kristofer Pister, ASPLOS 2000
    Paper Presenter: Ramy Farha

    Wireless Sensor Networks for Habitat Monitoring
    Alan Mainwaring, Joseph Polastre, Robert Szewczyk, David Culler, and John Anderson, WSNA 2002 (Wireless Sensor Networks and Applications)
    Paper Presenter: Alex Cheung

  13. Games

    An Efficient Synchronization Mechanism for Mirrored Game Architectures
    Eric Cronin, Burton Filstrup, Anthony R. Kurc, and Sugih Jamin, NetGames 2002
    Paper Presenter: Daniel Lin

    The Effect of Latency on User Performance in Warcraft III
    Nathan Sheldon, Eric Girard, Seth Borg, Mark Claypool, Emmanuel Agu, Netgames 2003
    Paper Presenter:Peter Yiannacouras