Special Topics in Software Engineering: Dependable Systems
ECE 1724, Winter 2009
University of Toronto
validate
Instructor: Ashvin Goel
Course Number: ECE 1724
Course Time: Monday, 1-3 pm
Course Room: SF2104 (note the room change)
Start Date: Jan 12, 2009
Course Description
Modern computer systems have become tightly intertwined with our daily
lives. However, they are failure-prone, insecure and difficult to manage
and thus hardly dependable. These problems have become even more severe
with increased networking and with easy availability of inexpensive,
powerful and embedded devices. While these dependability problems dominate
cost of ownership of computer systems, unfortunately they have no simple
solutions. There is a realization that these problems cannot be decisively
solved but are ongoing facts of life that must be dealt with regularly. To
do so, systems should be designed to detect, isolate and recover from these
problems.
This advanced graduate-level course focuses on dependability in software
systems and examines current research that aims to address challenges caused
by software defects, intrusions and software
misconfiguration. Students are expected to read and critique recent
research papers in operating systems and security that cover these areas. They
are also expected to work on a research project and make class
presentations. While there are no specific prerequisites for this course,
students who have taken undergraduate or graduate courses in operating systems,
security, networks and distributed systems will have an edge.
Textbooks
There are no required textbooks for this course. The optional
textbooks are
- Modern Operating Systems (Third Edition), by Andrew
S. Tanenbaum. Published by Prentice Hall, 2008.
- Distributed Systems: Concepts and Design (Third Edition), by
George Coulouris, Jean Dollimore and Tim Kindberg. Published by Addison
Wesley, 2001.
Mailing List
Please subscribe to the class mailing list by joining
this group. You
will need a Yahoo account, although Yahoo will forward the group messages to
any email address of your choice. The instructor will use this group to send
instructions and reminders. All students who subscribe to the group can send
email to the group by sending mail to
this list. The
group is not moderated. If you have a specific question for the instructor,
please send an email to the instructor directly. For the first week of
classes, you can join the group directly. After that the Yahoo groups website
will require the instructor's approval to subscribe you.
Grading Policy
Grades will be based on class presentations, a class project, and class
participation. There will be no final exam in this course. The
grading breakup is as follows:
- Class presentation: 30%
- Class project: 50%
- Class participation: 20%
Note: If a student is unable to attend a class, he or
she will lose 2% for non-participation.
Class Presentation
Each week this class will cover a group of papers that focuses on a
specific aspect of the course. Students are expected to read all the
papers in the group that will be presented. At the beginning of the term,
each paper will be assigned to a student who will be presenting the
paper. Presentations will be limited to 20 minutes.
More details about the presentation
format. Please read very carefully.
Assignments
There will be no assignments in this course.
Class Project
A major component of this course is devoted to a term-long project. The
topic of the project is largely up to you, but to help you choose a
project, a sample list of projects is provided below. This list should
help students determine whether their own projects are of reasonable size
and scope.
More details about the project
format. Please read very carefully.
Project Ideas
Here is a list of project ideas.
Readings
This is a tentative list. These papers can be accessed from either
the ACM or
the Usenix web site. If you cannot access
ACM articles directly, please read the
following instructions for accessing the
papers via the UoT online Library.
Week 1: Introduction (Jan 12)
-
Why Do Computers Stop and What Can Be Done About It? SRDS 1986.
-
Broad New OS Research: Challenges and Opportunities. HOTOS 2005.
- Introduction to Dependable Software Systems by Instructor.
- Efficient Readings of Papers in Science and Technology.
- How (and How Not) to Write a Good Systems Paper. Operating Systems Review 1983.
Week 2: Bug Detection and Diagnosis (Jan 19)
-
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems
Code. SOSP 2001. Bilal
-
Triage: Diagnosing Production Run Failures at the User's Site. SOSP
2007. Henry
Optional reading:
-
Using Model Checking to Find Serious File System Errors. OSDI 2004.
Week 3: Race and Deadlock Detection (Jan 26)
-
RacerX: Effective, Static Detection of Race Conditions and Deadlocks, SOSP
2003. Volodymyr
-
Finding and Reproducing Heisenbugs in Concurrent Programs. OSDI
2008. Bilal
Optional reading:
-
RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking,
SOSP 2005.
-
Deadlock Immunity: Enabling Systems to Defend Against Deadlocks. OSDI 2008.
Week 4: Software Fault Isolation (Feb 2)
-
Efficient Software-Based Fault Isolation. SOSP 1993. Tony
-
Hive: Fault Containment for Shared-Memory Multiprocessors. SOSP
1995. Mathew
Optional reading:
-
Hypervisor-based Fault-tolerance. SOSP 1995.
-
Dealing With Disaster: Surviving Misbehaved Kernel Extensions. OSDI 1996.
Week 5: Software Fault Isolation (Feb 9)
-
Unmodified Device Driver Reuse and Improved System Dependability via Virtual
Machines. OSDI 2004. Mathew
-
CuriOS: Improving Reliability through Operating System Structure. OSDI
2008. Maxim
Optional reading:
-
Improving the Reliability of Commodity Operating Systems. SOSP 2003.
Week 6: Reading Week (Feb 16)
Week 7: Failure Recovery (Feb 23)
-
Rx: Treating Bugs As Allergies - A Safe Method to Survive Software
Failures. SOSP 2005. Maxim
-
Remus: High Availability via Asynchronous Virtual Machine Replication. NSDI
2008. Jon
Optional reading:
-
Recovering Device Drivers. OSDI 2004.
-
Enhancing Server Availability and Security Through Failure-Oblivious
Computing. OSDI 2004.
-
SafeDrive: Safe and Recoverable Extensions Using Language-Based Techniques. OSDI
2006.
Week 8: Application-Specific Failure Recovery (Mar 2)
-
Undo for Operators: Building an Undoable E-mail Store. Usenix
2003. Don
-
Microreboot - A Technique for Cheap Recovery. OSDI 2004. Zoe
Week 9: Testing and Development (Mar 9)
-
KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex
Systems Programs. OSDI 2008.Volodymyr
-
R2: An Application-Level Kernel for Record and Replay. OSDI
2008. Zhengjun
Week 10: Secure Execution (Mar 16)
-
Model-Carrying Code: A Practical Approach for Safe Execution of Untrusted
Applications. SOSP 2003. Zhengjun
-
Usable Mandatory Integrity Protection for Operating Systems. Security and
Privacy (Oakland) 2007. Zoe
Week 11: Browser-Based Safe Execution (Mar 23)
-
Protection and Communication Abstractions for Web Browsers in MashupOS. SOSP
2007. Tony
-
Leveraging Legacy Code to Deploy Desktop Applications on the Web. OSDI
2008. Don
Week 12: Performance Misconfiguration (Mar 30)
-
Correlating Instrumentation Data to System States: A Building Block for
Automated Diagnosis and Control. OSDI 2004. Jon
-
AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web
2.0 Applications, SOSP 2007. Tony Zhao
Optional reading:
-
Performance Debugging for Distributed Systems of Black Boxes. SOSP 2003.
-
Capturing, Indexing, Clustering, and Retrieving System History. SOSP 2005.
Week 13: System Misconfiguration (Apr 6)
-
Understanding and Dealing with Operator Mistakes in Internet Services. OSDI
2004. Henry
-
Staged Deployment in Mirage, an Integrated Software Upgrade Testing and
Distribution System. SOSP 2007. Tony Zhao
Optional reading:
-
Configuration Debugging as Search: Finding the Needle in the Haystack. OSDI
2004.
-
AutoBash: Improving Configuration Management with Operating System Causality
Analysis. SOSP 2007.
Week 14: Project Presentations (Apr 13)