Special Topics in Software Engineering: Dependable Systems
ECE 1724, Winter 2009
University of Toronto
validate
Instructor: Ashvin Goel
Course Number: ECE 1724
Course Time: Monday, 1-3 pm
Course Room: SF2104 (note the room change)
Start Date: Jan 12, 2009
Project Suggestions
Some suggested projects are described below. The instructor can provide you
more details about the project. Please make sure to get a confirmation about a
project from the instructor before starting the project
- Diagnosing Bugs in Web Applications
Web applications are becoming increasingly popular today. Bugs in these
applications can cause data loss and corruption for all users using the
application and the application may stop working after this corruption.
This project involves diagnosing the root causes of bugs that cause
data corruption in web applications. The instructor will provide code that can
be used to replay web application requests. Using this replay mechanism, your
goal is to help an administrator determine the cause of the corruption, similar
to the Triage paper.
- Browser-Based Safe Execution
Browsers have become an important component of our computing
environment. However, bugs in browsers are increasingly being used to compromise
systems. A common attack against browsers involves privilege escalation in which
remote code is able to escape the browser sandbox and runs with the complete
privileges of the browser. The instructor's research group has designed a
taint-based method for an older version of the firefox browser for detecting
some of these attacks. The newest version of the firefox browser (version 3)
provides various new security features. This project involves porting the
tainting method to the latest version of the browser (version 3) and conducting
a study to determine how well the method works for the latest version of the
browser.
- Recovery via Restarting Applications
The "Microreboot" paper described a method by which parts of an application
are rebooted to allow recovery of the application. This approach gets rid
of faulty state in the application. In this project, you will choose either
a content download application (e.g., bittorrent
) or an
instant messaging application (e.g., gaim
) and implement a
recovery via "reboot" method for this application. You need to make sure
that the persistent data (e.g, the music repository or the instant messages
received) in the application is not lost. How fine is your reboot
granularity? Can you tune it? How often is reboot possible? What types of
faults or bugs can the reboot handle? How does the reboot affect user
perception?
- Application-Level Undo and Recovery
The "Undo for Operators" paper implemented an undoable email service. In
general, their application-level undo and recovery service requires
applications whose operations have well-defined semantics and can be
serialized. Another example that satisfies this criteria is a calendar
service. Can you think of other such applications? Choose an application and
implement an undoable service for that application. Describe the properties of
this undoable application. How does application-specific recovery improve on
generic recovery as described in the "Exploring Failure Transparency" paper?
- Detecting File System Bugs
Traditionally, file systems have not handled hardware failures well. This has
not been a serious problem as long as these failures are rare (e.g., only occur
when disks are dying and are soon replaced). However, enterprise storage
management systems are increasingly using SAN technology in which remote
computer storage devices (such as disk arrays) appear as locally attached
storage to the operating system. With SANs, network load can cause network
timeouts, which appear as a temporary hardware failure to the operating
system. These temporary failures can lead to severe file system errors. The goal
of this project is to improve the reliability of file systems in the face of
hardware failures. One option is to take advantage of the fact that different
file systems handle failures differently. As a result, a simple fault tolerance
method would be to replicate all file system operations to two different file
systems and detect errors based on comparing the outputs of the operations.