Special Topics in Software Engineering: Dependable Software

ECE 1724, Winter 2006
University of Toronto


Course home page


Project Suggestions

  1. Recovery via Restarting Applications

    The "Microreboot" paper described a method by which parts of an application are rebooted to allow recovery of the application. This approach gets rid of faulty state in the application. In this project, you will choose either a music-downloading or peer-to-peer application (e.g., bittorrent) or an instant messaging application (e.g., gaim) and implement a recovery via "reboot" method for this application. You need to make sure that the persistant data (e.g, the music repository or the instant messages received) in the application is not lost. How fine is your reboot granularity? Can you tune it? How often is reboot possible? What types of faults or bugs can the reboot handle? How does the reboot affect user perception?

  2. Application-Level Undo and Recovery

    The "Undo for Operators" paper implemented an undoable email service. In general, their application-level undo and recovery service requires applications whose operations have well-defined semantics and can be serialized. Another example that satisfies this criteria is a calendar service. Can you think of other such applications? Choose a calendar service or any one such application and implement an undoable service for that application. Describe the properties of this undoable application. How does application-specific recovery improve on generic recovery as described in the "Exploring Failure Transparency" paper?

  3. Analysis of Failures in Real Applications

    In this project you will study the bug reports of some open-source applications and determine the types of bugs that are reported for these applications. Your analysis should be as complete as possible. You can use the "Wither Generic Recovery" paper to guide your analysis. How does your analysis compare with the analysis of the paper. Next, similar to the "Rx" paper in the reading list, analyze these bugs in terms of the recovery methods you would use to survive these failures. Implement one of these recovery mechanisms. For your study, choose a diverse set of applications where some are known to be relatively stable and others are buggy.

  4. Misconfiguration Detection

    The "PeerPressure" paper automatically detected misconfiguration in the Windows registry by comparing the registry entries across multiple machines. This comparison was done using a simple heuristic that determined whether a registry entry was very similar or dissimilar across machines. In this project you will implement a different heuristic that uses an AI technique (e.g., clustering) to determine misconfiguration. Compare this approach with the original PeerPressure approach. You can use any registry-like application.

  5. Using Source-Code Control for One-Way Isolation

    One-way isolation is a method for realizing a safe execution environment. One-way isolation separates the file-system operations of a process or a group of processes and allows either aborting or atomically commiting these file-system operations to the file system at some point in the future. This approach allows testing a program in isolation. See the "One-Way Isolation" paper in the reading list. In this project, you will implement a safe execution environment using a source code control system (e.g., svn). How does your environment compare with the one-way isolation method, i.e., what are the pros and cons of your approach? How would you use your environment for testing system configuration? What other applications can you implement using your environment? If time permits, implment one such application.

  6. Taint Analysis for Intrusion Detection

    Several intrusion response papers in the reading list (e.g, "Dynamic Taint Analysis", "Fast and Automated Generation", "Vigilante") use a taint analysis method for detecting intrusions. This method essentially determines whether data in a network packet or data that depends on a network packet is ever executed. The dependence is established at the machine instruction level. You have two choices. Either you can get an available taint analysis tool and evaluate its effectiveness. Otherwise, think of the easiest way to implement a rudimentary taint analysis tool. Implement this tool and evaluate its performance.