Michael Stumm: Publications

Paper Details

Reference:

Alex Depoutovitch and Michael Stumm,
"Otherworld --- Giving applications a chance to survive OS kernel crashes",
In Proceedings of the 5th European Conference on Computer Systems (EUROSYS'10), Paris, France, ACM, New York, NY, USA, April, 2010, pp. 181–194.

Download:

PDF

Abstract:

The default behavior of all commodity operating systems today is to restart the system when a critical error is encountered in the kernel. This terminates all running applications with an attendant loss of "work in progress" that is nonpersistent.

Otherworld is a mechanism that microreboots the operating system kernel when a critical error is encountered in the kernel, and it does so without clobbering the state of the running applications. After the kernel microreboot, Otherworld attempts to resurrect the applications that were running at the time of failure. It does so by restoring the application memory spaces, open files and other resources. In the default case it then continues executing the processes from the point at which they were interrupted by the failure. Optionally, applications can have user-level recovery procedures registered with the kernel, in which case Otherworld passes control to these procedures after having restored their process state. Recovery procedures might check the integrity of application data and restore resources Otherworld was not able to restore.

We implemented Otherworld in Linux, but we believe that the technique can be applied to all commodity operating systems. In an extensive set of experiments on real-world applications (MySQL, Apache/PHP, Joe, vi), we show that Otherworld is capable of successfully microrebooting the kernel and restoring the applications in over 97% of the cases. In the default case, Otherworld adds zero overhead to normal execution. In an enhanced mode, Otherworld can provide extra application memory protection with overhead of between 4% and 12%.

Keywords:

operating systems, crash kernel, kernel, microreboot, recovery

Reference Info:

DOI: 10.1145/1755913.1755933
ACMid: 1755933
ISBN: 978-1-60558-577-2

BibTeX:

@inproceedings(Depoutovitch-EuroSys10,
    author = {Alex Depoutovitch and Michael Stumm},
    title = {Otherworld --- {Giving} applications a chance to survive {OS} kernel crashes},
    booktitle = {Proceedings of the 5th European Conference on Computer Systems (\textbf{EUROSYS'10})},
    location = {Paris, France},
    organization = {ACM},
    address = {New York, NY, USA},
    month = {April},
    year = {2010},
    pages = {181-194},
    doi = {10.1145/1755913.1755933},
    isbn = {978-1-60558-577-2},
    keywords = {operating systems, crash kernel, kernel, microreboot, recovery}
)