Instructor: Ashvin Goel
Course Number: ECE1781H
Course Time: Wed, 3-5 pm
Course Room: BA4164
Start Date: Jan 10, 2018

Home
Accessing Papers
Presentation Format
Project Format
Project Ideas

Dependable Software Systems

ECE1781, Winter 2018
University of Toronto


Project Ideas

Some suggested projects are described below. Please talk to the instructor about more details regarding the projects. Please make sure to get a confirmation about any project from the instructor before starting the project.

It is important for you to have thought about the following questions regarding each project before starting any design and implementation: 1) what problem are you addressing, 2) what is interesting/novel about your approach, 3) what metrics and testing method will you use for evaluation, and 4) what results do you expect from the evaluation.

  1. Bug Detection Using Symbolic Execution

    Several papers in the reading list use symbolic execution for detecting bugs. In this project, you will use any available symbolic execution tool (e.g., Klee, S2E) to detect bugs in some simple programs. On what basis will you choose an application? How will you know that you have detected a bug?
  2. Cross-Checking Semantic Correctness of Device Drivers

    The file system bugs paper in the reading list (Week 2) is able to cross check semantic correctness by taking advantage of the implicit VFS specification in the Linux kernel. You could consider implementing using their system for other applications, such as specific types of device drivers.
  3. Race Detection

    The goal of this project is to detect races in an existing application. You can an use available race detector. What test framework will you use to trigger races? Is it easy to replicate bugs that are found? How can you use techniques we have discussed in class to increase the likelihood of catching races?
  4. Recovery via Restarting Applications

    The "Microreboot" paper described a method by which parts of an application are rebooted to allow recovery of the application. This approach gets rid of faulty state in the application. In this project, you will choose an application and implement a recovery via "reboot" method for this application. You need to make sure that the persistent data in the application is not lost. For example, for a content download application (e.g., bittorrent),the music repository must not be lost. Similarly, for an instant messaging application (e.g., gaim), the received messages should not be lost. How fine is your reboot granularity? Can you tune it? How often is reboot possible? What types of faults or bugs can the reboot handle? How does the reboot affect user perception? Would you change the application design based on your experience with micro-reboot based restarting.
  5. Application-Level Undo and Recovery

    The "Undo for Operators" paper implemented an undoable email service. In general, their application-level undo and recovery service requires applications whose operations have well-defined semantics and can be serialized. Another example that satisfies this criteria is a calendar service. Can you think of other such applications? Choose an application and implement an undoable service for that application. Describe the properties of this undoable application. How does application-specific recovery improve on generic recovery as described in the "Exploring Failure Transparency" paper?
  6. N-Version File Systems

    The goal of this project is to improve the reliability of file systems in the face of hardware and file system bugs. One option is to take advantage of the fact that different file systems handle failures differently. As a result, a simple fault tolerance method would be to replicate all file system operations to two different file systems and detect errors based on comparing the outputs of the operations. You could apply this technique for cloud-based storage as well.
  7. Detecting I/O Bugs in Applications

    File systems perform writes asynchronously and often do not report failures if the data cannot be written to disk successfully (e.g., the application may have exited by the time the file system tries to flush data to disk and an error occurs during the flushing operation). In this project, you will study how this behavior can affect applications. Your task will be to induce I/O failures on writes and observe how applications handle such failures. For example, how do storage applications that care about your data, such as git, or sqlite, behave after such failures? How would you evaluate whether applications fail gracefully?

  8. Improving the Reliability of File Systems With Online Consistency Checking

    The goal of this project is to ensure that file system and kernel bugs do not cause corruption of file system metadata. As a result, an offline file system check does not have to be run even if the kernel or the file system is arbitrarily buggy. The instructor's group has worked on this project and shown the feasibility of this approach for the Linux Ext3/4 and Btrfs file systems. In this project, the same technique would be applied to a file system designed specifically for flash devices such as the Linux F2FS file system. Talk to the instructor for details.