Special Topics in Software Engineering: Dependable Systems
ECE 1724, Fall 2007
University of Toronto
Instructor: Ashvin Goel
Course Number: ECE 1724
Course Time: Thursday, 4-6 pm
Course Room: GB 221
Start Date: Sep 20, 2007
Modern computer systems have become tightly intertwined with our daily
lives. However, they are failure-prone, insecure and difficult to manage
and thus hardly dependable. These problems have become even more severe
with increased networking and with easy availability of inexpensive,
powerful and embedded devices. While these dependability problems dominate
cost of ownership of computer systems, unfortunately they have no simple
solutions. There is a realization that these problems cannot be decisively
solved but are ongoing facts of life that must be dealt with regularly. To
do so, systems should be designed to detect, isolate and recover from these
This advanced graduate-level course focuses on dependability in software
systems and examines current research that aims to address challenges
caused by software defects, intrusions and software
misconfiguration. Students are expected to read and critique recent
research papers in operating systems and networking that cover these
areas. They are also expected to work on a research project and make class
presentations. While there are no specific prerequisites for this course,
students who have taken undergraduate or graduate courses in operating
systems, networks and distributed systems will have an edge.
There are no required textbooks for this course. The optional
- Modern Operating Systems (Second Edition), by Andrew
S. Tanenbaum. Published by Prentice Hall, 2001.
- Computer Networking : A Top-Down Approach Featuring the
Internet (Third edition), by James Kurose and Keith
W. Ross. Published by Addison Wesley, 2002.
- Distributed Systems: Concepts and Design (Third Edition), by
George Coulouris, Jean Dollimore and Tim Kindberg. Published by Addison
Please subscribe to the class mailing list by joining
You will need a Yahoo account, although Yahoo will forward the group
messages to any email address of your choice. The instructor will use this
group to send instructions and reminders. All students who subscribe to the
group can send email to the group by sending mail to
this list. The
group is not moderated. If you have a specific question for the instructor,
please send an email to the instructor directly. For the first week of
classes, you can join the group directly. After that the Yahoo groups
website will require the instructor's approval to subscribe you.
Grades will be based on class presentations, a class project, and class
participation. There will be no final exam in this course. The
grading breakup is as follows:
- Class presentation: 30%
- Class project: 50%
- Class participation: 20%
Note: If a student is unable to attend a class, he or
she will lose 2% for non-participation.
Each week this class will cover a group of papers that focuses on a
specific aspect of the course. Students are expected to read all the
papers in the group that will be presented. At the beginning of the term,
each paper will be assigned to a student who will be presenting the
paper. Presentations will be limited to 20 minutes.
More details about the presentation
format. Please read very carefully.
There will be no assignments in this course.
A major component of this course is devoted to a term-long project. The
topic of the project is largely up to you, but to help you choose a
project, a sample list of projects is provided below. This list should
help students determine whether their own projects are of reasonable size
More details about the project
format. Please read very carefully.
Here is a list of project ideas.
This is a tentative list. If you cannot access ACM articles directly,
please read the following instructions
for accessing the papers via the UoT online Library. If a link to a paper
is missing, please use a search engine to find the paper.
Week 1: Introduction (Sept 20)
Why Do Computers Stop and What Can Be Done About It? SRDS 1986.
Broad New OS Research: Challenges and Opportunities. HOTOS 2005.
- Introduction to Dependable Software Systems by Instructor.
- Efficient Readings of Papers in Science and Technology.
- How (and How Not) to Write a Good Systems Paper. Operating Systems Review 1983.
Week 2: Software Fault Isolation (Sept 27)
Software-Based Fault Isolation. SOSP 1993. Jacky
Dealing With Disaster: Surviving Misbehaved Kernel Extensions. OSDI 1996. Vivek
Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines. OSDI 2004.
Improving the Reliability of Commodity Operating Systems. SOSP 2003.
Hive: Fault Containment for Shared-Memory Multiprocessors. SOSP 1995.
Hypervisor-based Fault-tolerance. SOSP 1995.
Week 3: Failure Recovery (Oct 4)
Exploring Failure Transparency and the Limits of Generic Recovery. OSDI 2000. Mervin
Undo for Operators: Building an Undoable E-mail Store. USENIX 2003. Pranit
Recovering Device Drivers. OSDI 2004. Stan
Enhancing Server Availability and Security Through Failure-Oblivious Computing. OSDI 2004.
Week 4: Failure Recovery (Oct 11)
Rx: Treating Bugs As Allergies---A Safe Method to Survive Software Failures. SOSP 2005.
SafeDrive: Safe and Recoverable Extensions Using Language-Based Techniques. OSDI 2006. James
The Taser Intrusion Recovery System. SOSP 2005.
Microreboot - A Technique for Cheap Recovery. OSDI 2004.
Week 5: Host-based Intrusion Analysis (Oct 18)
Backtracking Intrusions. SOSP 2003. Jackie
Flight Data Recorder: Always-on Tracing and Scalable Analysis of Persistent State Interactions to Improve Systems and Security Management. OSDI 2006. Ekin
Capturing System-wide Information Flow for Malware Detection and Analysis CCS 2007. Bogdan
ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. OSDI 2002.
Week 6: Web-based Intrusion Detection (Oct 25)
BrowserShield: Vulnerability-Driven Filtering of Dynamic HTML OSDI 2006. Eric
Execution-based Detection of Malicious Web Content Usenix Security
A Crawler-based Study of Spyware on the Web. NDSS 2006.
Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities. NDSS 2006.
Week 7: Safe Execution Environments (Nov 1)
Protection and Communication Abstractions for Web Browsers in MashupOS. SOSP 2007. Adrian
Usable Mandatory Integrity Protection for Operating Systems Security and Privacy (Oakland) 2007. Weihan
One-Way Isolation: An Effective Approach for Realizing Safe Execution Environments. NDSS 2005.
Week 8: Safe Execution with Control Flow Monitoring (Nov 8)
Secure Execution via Program Shepherding. USENIX Security 2002. Daniel
Secure Virtual Architecture: A Safe Execution Environment for
Commodity Operating Systems. SOSP 2007. Stan
of Persistent Kernel Control-Flow Attacks CCS 2007. Vivek
XFI: Software Guards for System Address Spaces. OSDI 2006.
Week 9: Safe Execution with Information Flow Monitoring (Nov 15)
Taint-Enhanced Policy Enforcement: A Practical Approach to Defeat a Wide Range of Attacks. Usenix Security 2006. Mervin
Flow Control For Standard OS Abstractions SOSP 2007. Ekin
Making Information Flow Explicit in HiStar. OSDI 2006.
Week 10: Intrusion Response (Nov 22)
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities. CCS 2005. Daniel
Bouncer: Securing Software by Blocking Bad Input SOSP 2007. Bogdan
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. NDSS 2005.
Fast and Automated Generation of Attack Signatures: A Basis for Building Self-Protecting Servers. CCS 2005.
Vigilante: End-to-End Containment of Internet Worms. SOSP 2005.
Week 11: System Misconfiguration (Nov 29)
Misconfiguration Troubleshooting with PeerPressure. OSDI 2004. Pranit
Configuration Management with Operating System Causality Analysis
SOSP 2007. Bin
Configuration Debugging as Search: Finding the Needle in the Haystack. OSDI 2004.
Understanding and Dealing with Operator Mistakes in Internet Services. OSDI 2004.
Week 12: Performance Misconfiguration (Dec 6)
Performance Debugging for Distributed Systems of Black Boxes. SOSP 2003. Bin
Capturing, Indexing, Clustering, and Retrieving System History. SOSP 2005. Eric
Highly Reliable Enterprise Network Services via Inference of Multi-level
Dependencies. SIGCOMM 2007. Weihan
Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control. OSDI 2004.
Using Magpie for Request Extraction and Workload Modelling. OSDI 2004.