About me
I am a professor in the Electrical and Computer Engineering Department and (by courtesy) Department of Computer Science, University of Toronto. I joined Univ. of Toronto in January 2013, after I received my Ph.D from the Computer Science Department of University of Illinois, Urbana-Champaign under the supervision of a great advisor, Yuanyuan Zhou. From 2009 to 2012 I was also a visiting student in the awesome System and Networking group of University of California, San Diego. My CV is here.
I founded a startup company called YScope with my PhD students so that our research can make real-world impact. Check out CLP, an open-source tool that can compress text logs and search compressed logs without decompression. This Uber Engineering Blog describes a deployment case-study of CLP.
My research interest is systems software, with a focus on developing practical solutions to improve the availability and performance of large software systems.
I am a Canada Research Chair in Systems Software and a recipient of McCharles Prize for Early Career Research Distinction. I also received a few teaching awards, including the Gordon Slemon award and Student Choice Award (upper year instructor) of Faculty of Engineering.
I am looking for self motivated students to work with me. If you are interested, please submit your application here.
News
- Hacker News [1], [2],
- Discussions from HBase developers, which prompted a series of reactions to address the problems we mentioned in the paper.
- Twitter discussions: see this, this, and this (if you're looking for a screenshot that summarizes our paper, see this or this).
- Blog: the morning paper (also it is considered as a highlight of 2016), It Will Never Work In Theory, Another word for it, Metadata, Fifty Quick Ideas to Improve Your Tests, Postmortem lessons, Some discussions on Google+.
- And quite a few emails sent to us from developers...
Selected publications
- Relational Debugging -- Pinpointing Root Causes of Performance Problems. To appear the Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23), July, 2023.
- Investigating Managed Language Runtime Performance: Why JavaScript and Python are 8x and 29x slower than C++, yet Java and Go can be Faster? In the Proceedings of the 2022 USENIX Annual Technical Conference (ATC'22), July 11-13, 2022. Pages 835--852. [USENIX ;login: article] [Code]
- Hubble: Performance Debugging with In-Production, Just-In-Time Method Tracing on Android. In the Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22), July 11-13, 2022. Pages 787--803.
- ctFS: Replacing File Indexing with Hardware Memory Translation through Contiguous File Allocation for Persistent Memory. In the Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST'22), Febuary 22-24, 2022. Best paper award runner up. [ACM Transaction on Storage article] [USENIX ;login: article] [Code]
- Understanding and Detecting Software Upgrade Failures in Distributed Systems In the Proceedings of The 28th ACM Symposium on Operating Systems Principles (SOSP'21), October 25-28, 2021. [Code]
- CLP: Efficient and Scalable Search on Compressed Text Logs. In the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI'21). July 14--16, 2021. Pages 183--198. [Code]
- M3: End-to-End Memory Management in Elastic Systems Software Stack. In the 16th ACM European Conference on Computer Systems (EuroSys 2021), April, 2021. Pages 507-522. [Code]
- The Inflection Point Hypothesis: A Principled Debugging Approach for Locating the Root Cause of a Failure. In the 27th ACM Symposium on Operating Systems Principles (SOSP’19), October 2019, Huntsville, Ontario, Canada. [Press: The morning paper] [USENIX ;login: article]
- An Analysis of Performance Evolution of Linux's Core Operations. In the 27th ACM Symposium on Operating Systems Principles (SOSP’19), October 2019, Huntsville, Ontario, Canada. [Press: The morning paper] [Code]
- Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold. In the 26th ACM Symposium on Operating Systems Principles (SOSP’17), October 2017, Shanghai, China. [Press: The morning paper][Code][Impact: licensed by Netflix]
- Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems using the Event Chaining Approach. In the 26th ACM Symposium on Operating Systems Principles (SOSP’17), October 2017, Shanghai, China.
- Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), November 2016, Savannah, GA.
- Don't Get Caught In the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead In Data-parallel Systems. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), November 2016, Savannah, GA. [Press: Invited publication: USENIX ;login: 42(1), The Next Platform][Code]
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-intensive Systems. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), October 2014, Broomfield, CO
- lprof: A Non-intrusive Request Flow Profiler for Distributed Systems. In the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), October 2014, Broomfield, CO. *: Equally contributed.
- Do Not Blame Users for Misconfigurations Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13), November 2013.
- Be Conservative: Enhancing Failure Diagnosis with Proactive Logging Proceedings of the 9th ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI’12), Hollywood, CA, October 2012
- Improving Software Diagnosability via Log Enhancement ACM Transactions on Computer Systems (TOCS), Februray 2012. Fast-forwarded from ASPLOS'11.
- SherLog: Error Diagnosis by Connecting Clues from Run-time Logs. In the Proceedings of the 15th International Conference on Architecture Support for Programming Language and Operating Systems (ASPLOS’10), pages 143-154, Pittsburgh, PA., March 2010.
- /* iComment: Bugs or Bad Comments? */ In the Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07), pages 145-158, October 2007.
Full publication list
Group
It is fun to work with the following incredible people:
Post-doc:
- Huangshi Tian
Graduate students:
- Rishikesh Devsot
- Adrian Chiu
- Devin Gibson
- Zhuqi Jin
- Ruibin Li
- David Lion
- Jack Luo
- Xiang (Jenny) Ren
- Rui Wang
- Sitao Wang
- Xiaochong Wei
- Haiqi Xu
- Yi Fan Yu
Alumni:
- Yongle Zhang, PhD 2020, First Employment: Assistant Professor, Department of Computer Science, Purdue University. Winner of The SIGOPS Dennis M. Ritchie Thesis Award.
- Xu Zhao, PhD 2021, First Employment: Research Scientist@Facebook. Winner of Facebook Fellowship.
- Kirk Rodrigues, PhD 2023, First Employment: Co-founder of YScope.
- Hailong Sun, visiting scholar, now Professor at Beihang University
- Serhei Makarov, Master of Applied Science, now at Red Hat.
- Muhammad FaizanUllah (Undergraduate thesis) -> Microsoft
- Neil Newman (Undergraduate thesis) -> graduate school@UBC
- Alan Chung (Undergraduate thesis)
Teaching
- ECE344 Operating Systems: [Winter22][Winter21][Winter20][Winter18][Winter17][Winter16][Winter15][Winter14][Winter13]
- ECE454 Computer Systems Programming: [Fall18][Fall14][Fall13]
- ECE244 Programming Fundamentals: [Fall22][Fall17][Fall16]
- ECE1759 Graduate OS: [Fall23][Fall22][Fall21][Fall20][Fall17][Fall16][Fall14]
Program committee
- 2023: SOSP, OSDI, EuroSys, EuroSys Poster (PC co-chair)
- 2022: OSDI
- 2021: OSDI, SOSP, HAOC, ASPLOS
- 2020: OSDI, NSDI
- 2019: HotOS (PC Co-chair with Jinyang Li), APSys (PC Co-chair with Yu Hua)
- 2018: OSDI, EuroSys, ASPLOS (ERC)
- 2017: SOSP, Student Research Competition@SOSP'17 (chair)
- 2016: ASPLOS (also chair of poster and lightning session)
- 2015: USENIX Annual Technical Conference, USENIX LISA, SOSP (poster PC)
- 2014: OSDI (external review committee), USENIX Annual Technical Conference, SIGMETRICS, USENIX ICAC
- 2012: USENIX Workshop on Managing Systems Automatically and Dynamically
Misc
I play a lot of sports, including basketball, skiing, swimming, and running. I was the captain of the Beihang's CSE basketball team when I was an undergrad and co-captain of the UIUC CS faculty & grad-student basketball team in the intramural games. I also ran some marathon and half-marathons (see a not-so-recent photo here). When I have more time, I also play accordion and piano.
