Advances in Distributed Systems
ECE 1746, Fall 2004
University of Toronto


Instructor: Ashvin Goel
Course Time: Tuesday, 1-3 pm
Course Room: Galbraith Building (GB) 120
Start Date: Sept 14, 2004


Course Description

The exponential growth of Internet services demonstrates the importance and potential of large-scale distributed systems. Today, Web services allow online shopping of virtually any product from cheap second-hand items to expensive art collections. Content delivery networks can potentially speed these services by cleverly caching Web pages. Peer-to-peer applications allow sharing of content in ways that are making industry nervous about their profit margins. Multimedia services provide streaming delivery of audio and video. The new classes of distributed applications that are becoming ubiquitous seems endless: cluster computing, grid computing, game services, pervasive computing, etc. In this scenario, a fundamental challenge is to provide scalable, secure and robust services in the presence of best-effort communication and unreliable nodes.

This graduate-level course focuses on distributed computing from a systems software perspective. Students are expected to read and critique recent research papers that cover some of the distributed applications mentioned above and span areas such as operating systems and networks. They are also expected to work on a research project and make a presentation.

While there are no specific prerequisites for this course, students who have taken undergraduate courses in operating systems, networks and distributed systems will have an edge.

Textbooks

There are no required textbooks for this course. The optional textbook is Distributed Systems: Concepts and Design (Third Edition), by George Coulouris, Jean Dollimore and Tim Kindberg. Published by Addison-Wesley, 2001. ISBN 0-201-61918-0.

Mailing List

Please subscribe to the class mailing list by joining this group. You will need a Yahoo account, although Yahoo will forward the group messages to any email address of your choice. The instructor will use this group to send out assignments and reminders. All students who subscribe to the group can send email to the group. The group is not moderated. If a student has a specific question for the instructor, please send an email to the instructor directly. For the first week of classes, you can join the group directly. After that you will need approval from the instructor.

Grading Policy

Grades will be based on class presentation and the questions prepared for the discussion, class project and presentation, assignments and class participation and discussion. There will be no final exam in this course. The grading breakup is as follows: Note: If a student is unable to attend a class, he or she will lose 2% for non-participation. No exceptions.

Class Presentation

Each week this class will cover a group of papers that focuses on a specific aspect of distributed systems. Students are expected to read all the papers in the group that will be presented (the number of presentations depends on the number of students in class). At the beginning of the term, each paper will be assigned to a student who will be presenting the paper. Presentations will be limited to 15 minutes.

More details about the presentation format. Please read very carefully.

Class Project

A major component of this course is devoted to a term-long project. The topic of the final project is largely up to you, but to help you choose a project, a sample list of projects is provided below. This list should help students determine whether their own projects are of reasonable size and scope.

More details about the project format. Please read very carefully.

Project Ideas

Here is a list of project ideas.


Assignments

The instructor will assign short assignments at the end of some classes. These assignments, which will consist of one or two questions that have to be answered, will typically be a follow up to the discussion in the class and will help students get a better grasp of the material.

Assignments will be sent to students by email as well as posted on this web site. They will be due the next class. Students are expected to submit a hard copy of the assignment. Please use typed text. Two to four assignments will be given during the term.

Assignment 1        Example Review for Paper 1        Example Review for Paper 2
Assignment 2        Answer for Assignment 2


Readings

This is a tentative list. If a link to a paper is missing, please use a search engine to find the paper.

Week 1: Introduction (Sept 14)

  1. Introduction to Distributed Systems. Instructor.
  2. Efficient Readings of Papers in Science and Technology. Michael J. Hanson, Dylan J. McNamee.
  3. How (and How Not) to Write a Good Systems Paper. Roy Levin, David D. Redell. Operating Systems Review 17(3), July 1983.

Week 2: Fault Tolerance (Sept 21)

  1. Path-Based Failure and Evolution Management. Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox, and Eric Brewer. NSDI 2004. Student Presenter: Tomasz Czajkowski.
  2. FUSE: Lightweight Guaranteed Distributed Failure Notification. John Dunagan, Nicholas J. A. Harvey, Michael B. Jones, Dejan Kostic, Marvin Theimer, and Alec Wolman. OSDI 2004. Student Presenter: Thomas Liu.
Optional papers:
  1. Using Fault Injection and Modeling to Evaluate the Performability of Cluster-Based Services. Kiran Nagaraja, Xiaoyan Li, Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen. USITS 2003.
  2. A Microrebootable System-Design, Implementation, and Evaluation. George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox. OSDI 2004.
  3. Why Do Internet Services Fail, and What Can Be Done About It? David Oppenheimer, Archana Ganapathi, and David A. Patterson. USITS 2003. Slides.

Week 3: Naming (Sept 28)

  1. Network-Sensitive Service Discovery. An-Cheng Huang and Peter Steenkiste. NSDI 2004. Student Presenter: Mehrdad Ariannejad.
  2. The Design and Implementation of a Next Generation Name Service for the Internet. Venugopalan Ramasubramanian, Emin Gun Sirer. SIGCOMM 2004. Student Presenter: Frank Plavec.
Optional papers:
  1. A Layered Naming Architecture for the Internet. Hari Balakrishnan , Karthik Lakshminarayanan, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, Michael Walfish. SIGCOMM 2004.
  2. Untangling the Web from DNS. Michael Walfish, Hari Balakrishnan, and Scott Shenker. NSDI 2004.

Week 4: File and Storage Systems (Oct 5)

  1. Google File System. Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. SOSP 2003. Student Presenter: Rita Chiu.
  2. Secure Untrusted Data Repository. Jinyuan Li, Maxwell Krohn, David Mazières, and Dennis Shasha. OSDI 2004. Student Presenter: Zheng Li.
Optional papers:
  1. Explicit Control in the Batch-Aware Distributed File System. John Bent, Douglas Thain, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Miron Livny. NSDI 2004.

Week 5: Resource Management (Oct 12)

  1. Resource Overbooking and Application Profiling in Shared Hosting Platforms. Bhuvan Urgaonkar and Prashant Shenoy, Timothy Roscoe. OSDI 2002. Student Presenter: Antonio Wang.
  2. Adaptive Overload Control for Busy Internet Servers. Matt Welsh and David Culler. USITS 2003. Student Presenter: Chuan Wu.
Optional papers:
  1. Integrated Resource Management for Cluster-based Internet Services. Kai Shen, Hong Tang, Tao Yang, Lingkun Chu.  OSDI 2002.
  2. SHARP: An Architecture for Secure Resource Peering. Yun Fu, Jeffery Chase, Brent Chun, Stephen Schwab, and Amin Vahdat. SOSP 2003.

Week 6: Replication (Oct 19) 

  1. FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment. Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. OSDI 2002. Student Presenter: Jing Su.
  2. Consistent and Automatic Replica Regeneration. Haifeng Yu, Amin Vahdat. NSDI 2004. Student Presenter: Kevin Yuen.
  3. The Dangers of Replication and a Solution. J. Gray, P. Helland, P. O'Neill, and D. Shasha. SIGMOD 1996. Student Presenter: Kaloian Manassiev.

Week 7: Recovery (Oct 26)

  1. TimeLine: A High Performance Archive for a Distributed Object Store. Chuang-Hue Moh and Barbara Liskov. NSDI 2004. Student Presenter: Vinod Muthusamy.
  2. Undo for Operators: Building an Undoable E-mail Store. Aron Brown and David Patterson. USENIX 2003. Student Presenter: Mark Jackman.
Optional papers:
  1. Self-Repairing Computers. Armando Fox and David Patterson. Scientific American 2004.

Week 8: Automated Management (Nov 2)

  1. Total Recall: System Support for Automated Availability Management. Ranjita Bhagwan, Kiran Tati, Yu-Chung Cheng, Stefan Savage, and Geoffrey M. Voelker. NSDI 2004. Student Presenter: Thomas Liu.
  2. Automatic Misconfiguration Troubleshooting with PeerPressure. Helen J. Wang, John Platt, Yu Chen, Ruyun Zhang, and Yi-min Wang. OSDI 2004. Student Presenter: Mahsa Moallem.
Optional papers:
  1. Understanding and Dealing with Operator Mistakes in Internet Services. Kiran Nagaraja, Fabio Oliveira, Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen. OSDI 2004.
  2. Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control. Ira Cohen, Jeff Chase, Moises Goldszmidt, Terence Kelly, and Julie Symons. OSDI 2004.
  3. Using Magpie for request extraction and workload modelling. Paul Barham, Austin Donnelly, Rebecca Isaacs, Richard Mortier. OSDI 2004.
  4. STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support. Yi-Min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. Wang, Chun Yuan, and Zheng Zhang. LISA 2003.

Week 9: Network Performance (Nov 9)

  1. Vivaldi: A Decentralized Network Coordinate System. Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris. SIGCOMM 2004. Student Presenter: Mehrdad Ariannejad.
  2. Locating Internet Bottlenecks: Algorithms, Measurements and Implications. Ningning Hu, Li Erran Li, Zhuoqing Morley Mao, Peter Steenkiste, Jia Wang. SIGCOMM 2004. Student Presenter: Dapeng Gao.
Optional papers:
  1. The Effectiveness of Request Redirection on CDN Robustness. Limin Wang, Vivek Pai, and Larry Peterson. OSDI 2002.

Week 10: Peer-to-Peer Networks (Nov 16)

  1. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan. SIGCOMM 2001. Student Presenter: Gregory Hartl.
  2. Making Gnutella-like P2P Systems Scalable. Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, Scott Shenker. SIGCOMM 2003. Student Presenter: Trevor Armstrong.
Optional papers:
  1. Handling Churn in a DHT. Sean Rhea, Dennis Geels, Timothy Roscoe, John Kubiatowicz. Usenix 2004.
  2. Modeling and Performance Analysis of Bit Torrent-Like Peer-to-Peer Networks. Dongyu Qiu, R. Srikant. SIGCOMM 2004.

Week 11: Fairness (Nov 23)

  1. Sprite: A Simple, Cheat-Proof, Credit-Based System for Mobile Ad-Hoc Networks. Sheng Zhong, Jiang Chen, Yang Richard Yang. Infocom 2003. Student Presenter: Alex Varshavsky.
  2. Performance Analysis of the CONFIDANT Protocol (Cooperation Of Nodes: Fairness In Dynamic Adhoc NeTworks). Sonja Buchegger, Jean-Yyes Le Boudec. MobiHoc 2002. Student Presenter: Alex Varshavsky.

Week 12: Multicast (Nov 30)

  1. The Feasibility of Supporting Large-Scale Live Streaming Applications with Dynamic Application End-Points. Kunwadee Sripanidkulchai, Aditya Ganjam, Bruce Maggs, Hui Zhang. SIGCOMM 2004. Student Presenter: Nazar Abbaz.
  2. SplitStream: High-Bandwidth Multicast in Cooperative Environments. Miguel Castro, Peter Druschel, Ann-Marie Kermarrec, Animesh Nandi, Antony Rowstron, Atul Singh. SOSP 2003. Student Presenter: Mea Wang.
Optional papers:
  1. Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh. Dejan Kostic, Adolfo Rodriguez, Jeannie Albrecht, Amin Vahdat. SOSP 2003.

Week 13: Project Presentation (Dec 9)