Special Topics in Software Engineering: Dependable Systems

ECE 1724, Winter 2009
University of Toronto

validate
Instructor: Ashvin Goel
Course Number: ECE 1724
Course Time: Monday, 1-3 pm
Course Room: SF2104 (note the room change)
Start Date: Jan 12, 2009

Project Suggestions

Some suggested projects are described below. The instructor can provide you more details about the project. Please make sure to get a confirmation about a project from the instructor before starting the project

Diagnosing Bugs in Web Applications

Web applications are becoming increasingly popular today. Bugs in these applications can cause data loss and corruption for all users using the application and the application may stop working after this corruption. This project involves diagnosing the root causes of bugs that cause data corruption in web applications. The instructor will provide code that can be used to replay web application requests. Using this replay mechanism, your goal is to help an administrator determine the cause of the corruption, similar to the Triage paper.
Browser-Based Safe Execution

Browsers have become an important component of our computing environment. However, bugs in browsers are increasingly being used to compromise systems. A common attack against browsers involves privilege escalation in which remote code is able to escape the browser sandbox and runs with the complete privileges of the browser. The instructor's research group has designed a taint-based method for an older version of the firefox browser for detecting some of these attacks. The newest version of the firefox browser (version 3) provides various new security features. This project involves porting the tainting method to the latest version of the browser (version 3) and conducting a study to determine how well the method works for the latest version of the browser.
Recovery via Restarting Applications

The "Microreboot" paper described a method by which parts of an application are rebooted to allow recovery of the application. This approach gets rid of faulty state in the application. In this project, you will choose either a content download application (e.g., bittorrent) or an instant messaging application (e.g., gaim) and implement a recovery via "reboot" method for this application. You need to make sure that the persistent data (e.g, the music repository or the instant messages received) in the application is not lost. How fine is your reboot granularity? Can you tune it? How often is reboot possible? What types of faults or bugs can the reboot handle? How does the reboot affect user perception?
Application-Level Undo and Recovery

The "Undo for Operators" paper implemented an undoable email service. In general, their application-level undo and recovery service requires applications whose operations have well-defined semantics and can be serialized. Another example that satisfies this criteria is a calendar service. Can you think of other such applications? Choose an application and implement an undoable service for that application. Describe the properties of this undoable application. How does application-specific recovery improve on generic recovery as described in the "Exploring Failure Transparency" paper?
Detecting File System Bugs

Traditionally, file systems have not handled hardware failures well. This has not been a serious problem as long as these failures are rare (e.g., only occur when disks are dying and are soon replaced). However, enterprise storage management systems are increasingly using SAN technology in which remote computer storage devices (such as disk arrays) appear as locally attached storage to the operating system. With SANs, network load can cause network timeouts, which appear as a temporary hardware failure to the operating system. These temporary failures can lead to severe file system errors. The goal of this project is to improve the reliability of file systems in the face of hardware failures. One option is to take advantage of the fact that different file systems handle failures differently. As a result, a simple fault tolerance method would be to replicate all file system operations to two different file systems and detect errors based on comparing the outputs of the operations.

Special Topics in Software Engineering: Dependable Systems ECE 1724, Winter 2009 University of Toronto

Project Suggestions

Special Topics in Software Engineering: Dependable Systems

ECE 1724, Winter 2009
University of Toronto